Why RAG Beats Pure Prompting

Large language models have broad knowledge but shallow depth on your specific domain. When a user asks about your product's pricing tiers, deployment requirements, or API rate limits, the model either hallucinates an answer or admits it does not know. Retrieval-Augmented Generation (RAG) solves this by searching your actual documents at query time and injecting relevant passages into the model's context.

The OpenAI Agents SDK includes FileSearchTool, which integrates with OpenAI's hosted vector stores to provide turnkey RAG. You upload documents, the platform chunks and embeds them, and the agent automatically searches them when answering questions. This guide walks through building a production documentation chatbot using FileSearch.

Vector Store Setup

Before the agent can search documents, we need to create a vector store and upload files. OpenAI handles chunking, embedding, and indexing automatically.

flowchart LR
    Q(["User query"])
    EMB["Embed query<br/>text-embedding-3"]
    VEC[("Vector DB<br/>pgvector or Pinecone")]
    RET["Top-k retrieval<br/>k = 8"]
    PROMPT["Augmented prompt<br/>system plus context"]
    LLM["LLM generation<br/>Claude or GPT"]
    CITE["Inline citations<br/>and page anchors"]
    OUT(["Grounded answer"])
    Q --> EMB --> VEC --> RET --> PROMPT --> LLM --> CITE --> OUT
    style EMB fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style VEC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style LLM fill:#4f46e5,stroke:#4338ca,color:#fff
    style OUT fill:#059669,stroke:#047857,color:#fff

# setup_vector_store.py
import openai
import time
from pathlib import Path

client = openai.OpenAI()

def create_documentation_store(docs_dir: str, store_name: str) -> str:
    """Create a vector store and upload all documentation files."""

    # Create the vector store
    vector_store = client.vector_stores.create(
        name=store_name,
        expires_after={"anchor": "last_active_at", "days": 30},
    )
    print(f"Created vector store: {vector_store.id}")

    # Collect all documentation files
    doc_files = []
    for ext in ["*.md", "*.txt", "*.pdf", "*.html"]:
        doc_files.extend(Path(docs_dir).glob(f"**/{ext}"))

    if not doc_files:
        raise ValueError(f"No documentation files found in {docs_dir}")

    print(f"Found {len(doc_files)} documentation files")

    # Upload files in batches
    file_ids = []
    for doc_path in doc_files:
        with open(doc_path, "rb") as f:
            uploaded = client.files.create(file=f, purpose="assistants")
            file_ids.append(uploaded.id)
        print(f"  Uploaded: {doc_path.name}")

    # Attach files to the vector store
    batch = client.vector_stores.file_batches.create(
        vector_store_id=vector_store.id,
        file_ids=file_ids,
    )

    # Wait for processing to complete
    while batch.status == "in_progress":
        time.sleep(2)
        batch = client.vector_stores.file_batches.retrieve(
            vector_store_id=vector_store.id,
            batch_id=batch.id,
        )
        print(f"  Processing: {batch.file_counts.completed}/{batch.file_counts.total}")

    print(f"Vector store ready: {vector_store.id}")
    return vector_store.id

if __name__ == "__main__":
    store_id = create_documentation_store(
        docs_dir="./documentation",
        store_name="product-docs-v1",
    )
    print(f"\nStore ID to use in agent config: {store_id}")

Building the RAG Agent

With the vector store created, we configure a chat agent that uses FileSearchTool to search documents before answering questions.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

# agents/docs_agent.py
from agents import Agent
from agents.tool import FileSearchTool

# Use the vector store ID from setup
VECTOR_STORE_ID = "vs_abc123"  # replace with your actual ID

docs_agent = Agent(
    name="docs_agent",
    model="gpt-4o",
    instructions="""You are a documentation assistant for Acme Platform.

Your job is to answer user questions accurately based on the official documentation.

Rules:
- ALWAYS search the documentation before answering technical questions
- Cite specific sections when referencing documentation
- If the documentation does not cover the user's question, say so clearly
- Do not fabricate features, endpoints, or configuration options
- For ambiguous questions, ask for clarification before searching
- When multiple documents are relevant, synthesize information from all of them
- Include code examples from the docs when they are relevant to the question""",
    tools=[
        FileSearchTool(
            vector_store_ids=[VECTOR_STORE_ID],
            max_num_results=5,
        ),
    ],
)

The max_num_results parameter controls how many document chunks are retrieved per search. Five is a good default — enough to cover the topic but not so many that irrelevant results dilute the context.

Citation Handling

When the agent retrieves information from documents, the response often includes citation markers. Proper citation handling is critical for user trust — users need to verify that the agent's answers come from real documentation.

# citation_handler.py
import re

def extract_citations(response_text: str, annotations: list) -> dict:
    """Extract and format citations from an agent response."""
    citations = {}

    for annotation in annotations:
        if hasattr(annotation, "file_citation"):
            cite = annotation.file_citation
            citation_key = annotation.text  # e.g., "【4:0†source】"
            citations[citation_key] = {
                "file_id": cite.file_id,
                "quote": getattr(cite, "quote", ""),
            }

    return citations

def format_response_with_citations(
    response_text: str,
    citations: dict,
    file_names: dict,  # file_id -> filename mapping
) -> str:
    """Replace citation markers with readable footnotes."""
    footnotes = []
    counter = 1

    for marker, cite_info in citations.items():
        file_id = cite_info["file_id"]
        filename = file_names.get(file_id, "unknown document")
        response_text = response_text.replace(marker, f"[{counter}]")
        footnotes.append(f"[{counter}] {filename}")
        counter += 1

    if footnotes:
        response_text += "\n\n---\n**Sources:**\n" + "\n".join(footnotes)

    return response_text

FastAPI Integration with Citations

The API endpoint processes the agent's response and extracts citations for the frontend to display.

# main.py
from fastapi import FastAPI
from pydantic import BaseModel
from agents import Runner

from agents.docs_agent import docs_agent
from session_manager import SessionManager
from citation_handler import extract_citations, format_response_with_citations

app = FastAPI()
sessions = SessionManager()

class ChatRequest(BaseModel):
    session_id: str
    message: str

class Citation(BaseModel):
    index: int
    filename: str
    quote: str

class ChatResponse(BaseModel):
    response: str
    citations: list[Citation]

@app.post("/docs/chat", response_model=ChatResponse)
async def docs_chat(request: ChatRequest):
    session = sessions.get_or_create(request.session_id)
    session.add_message("user", request.message)

    result = await Runner.run(
        docs_agent,
        input=session.to_input_list(),
    )

    session.result = result
    raw_output = result.final_output

    # Extract citations from response annotations
    annotations = []
    for item in result.new_items:
        if hasattr(item, "annotations"):
            annotations.extend(item.annotations)

    citation_map = extract_citations(raw_output, annotations)

    # Build file name mapping (in production, cache this)
    import openai
    client = openai.OpenAI()
    file_names = {}
    for cite_info in citation_map.values():
        fid = cite_info["file_id"]
        if fid not in file_names:
            try:
                f = client.files.retrieve(fid)
                file_names[fid] = f.filename
            except Exception:
                file_names[fid] = "unknown"

    formatted = format_response_with_citations(
        raw_output, citation_map, file_names
    )

    citations_list = []
    for i, (marker, info) in enumerate(citation_map.items(), 1):
        citations_list.append(Citation(
            index=i,
            filename=file_names.get(info["file_id"], "unknown"),
            quote=info.get("quote", ""),
        ))

    session.add_message("assistant", formatted)

    return ChatResponse(response=formatted, citations=citations_list)

Keeping the Vector Store Fresh

Documentation changes over time. A stale vector store produces outdated answers that erode user trust. Implement a refresh pipeline that syncs your documentation source with the vector store.

# refresh_vector_store.py
import openai
from pathlib import Path

client = openai.OpenAI()

def refresh_store(vector_store_id: str, docs_dir: str):
    """Refresh vector store by removing old files and uploading new ones."""

    # List existing files in the store
    existing = client.vector_stores.files.list(
        vector_store_id=vector_store_id
    )
    existing_ids = [f.id for f in existing.data]

    # Remove all existing files
    for fid in existing_ids:
        client.vector_stores.files.delete(
            vector_store_id=vector_store_id,
            file_id=fid,
        )

    # Upload fresh documents
    doc_files = []
    for ext in ["*.md", "*.txt", "*.pdf"]:
        doc_files.extend(Path(docs_dir).glob(f"**/{ext}"))

    new_ids = []
    for doc_path in doc_files:
        with open(doc_path, "rb") as f:
            uploaded = client.files.create(file=f, purpose="assistants")
            new_ids.append(uploaded.id)

    # Attach new files
    client.vector_stores.file_batches.create(
        vector_store_id=vector_store_id,
        file_ids=new_ids,
    )

    print(f"Refreshed store with {len(new_ids)} files")

Run this as a scheduled job (cron, GitHub Action, or CI/CD step) whenever your documentation repository is updated. For high-velocity documentation, trigger it on every merge to the docs branch.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Best Practices for RAG Chat Agents

Chunk size matters. OpenAI's default chunking works well for most documentation, but if your documents have very long code blocks or tables, consider splitting them into smaller files before upload. Each chunk should be self-contained enough to answer a question on its own.

Prompt the agent to search first. Without explicit instructions, the model may attempt to answer from its training data instead of searching. The instruction "ALWAYS search the documentation before answering technical questions" forces the agent to use FileSearch on every query.

Handle "not found" gracefully. When the vector store returns no relevant results, the agent should say so rather than guessing. The instruction "If the documentation does not cover the user's question, say so clearly" prevents hallucinated answers.

Monitor retrieval quality. Log which queries return zero results or low-relevance results. These are gaps in your documentation that should be filled, or indicators that the user's vocabulary does not match your documentation's terminology.

RAG-powered chat agents combine the natural language fluency of large language models with the factual grounding of your actual documentation. FileSearchTool makes the retrieval layer trivial to set up, letting you focus on the agent's instructions, citation handling, and user experience.

Building a RAG-Powered Chat Agent with FileSearch

Why RAG Beats Pure Prompting

Vector Store Setup

Building the RAG Agent

Citation Handling

FastAPI Integration with Citations

Keeping the Vector Store Fresh

Best Practices for RAG Chat Agents

Try CallSphere AI Voice Agents

Related Articles You May Like

Chatbot for Answering Questions: How to Build One That Works

Graphiti: How Temporal Knowledge Graphs Give AI Voice Agents Persistent Memory (2026 Guide)

How To Create A Chatbot In 2026: A Founder's Practical Guide

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

OpenAI Frontier: Model-Native Orchestration Is the Default in 2026

Gemini Enterprise vs Anthropic vs OpenAI Frontier: 2026 Comparison