Skip to content
Learn Agentic AI
Learn Agentic AI15 min read7 views

Building a RAG-Powered Chat Agent with FileSearch

Build a documentation chatbot using the OpenAI Agents SDK FileSearchTool with vector stores, citation handling, and hybrid retrieval for production-grade RAG chat agents.

Why RAG Beats Pure Prompting

Large language models have broad knowledge but shallow depth on your specific domain. When a user asks about your product's pricing tiers, deployment requirements, or API rate limits, the model either hallucinates an answer or admits it does not know. Retrieval-Augmented Generation (RAG) solves this by searching your actual documents at query time and injecting relevant passages into the model's context.

The OpenAI Agents SDK includes FileSearchTool, which integrates with OpenAI's hosted vector stores to provide turnkey RAG. You upload documents, the platform chunks and embeds them, and the agent automatically searches them when answering questions. This guide walks through building a production documentation chatbot using FileSearch.

Vector Store Setup

Before the agent can search documents, we need to create a vector store and upload files. OpenAI handles chunking, embedding, and indexing automatically.

flowchart TD
    START["Building a RAG-Powered Chat Agent with FileSearch"] --> A
    A["Why RAG Beats Pure Prompting"]
    A --> B
    B["Vector Store Setup"]
    B --> C
    C["Building the RAG Agent"]
    C --> D
    D["Citation Handling"]
    D --> E
    E["FastAPI Integration with Citations"]
    E --> F
    F["Keeping the Vector Store Fresh"]
    F --> G
    G["Best Practices for RAG Chat Agents"]
    G --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
# setup_vector_store.py
import openai
import time
from pathlib import Path

client = openai.OpenAI()


def create_documentation_store(docs_dir: str, store_name: str) -> str:
    """Create a vector store and upload all documentation files."""

    # Create the vector store
    vector_store = client.vector_stores.create(
        name=store_name,
        expires_after={"anchor": "last_active_at", "days": 30},
    )
    print(f"Created vector store: {vector_store.id}")

    # Collect all documentation files
    doc_files = []
    for ext in ["*.md", "*.txt", "*.pdf", "*.html"]:
        doc_files.extend(Path(docs_dir).glob(f"**/{ext}"))

    if not doc_files:
        raise ValueError(f"No documentation files found in {docs_dir}")

    print(f"Found {len(doc_files)} documentation files")

    # Upload files in batches
    file_ids = []
    for doc_path in doc_files:
        with open(doc_path, "rb") as f:
            uploaded = client.files.create(file=f, purpose="assistants")
            file_ids.append(uploaded.id)
        print(f"  Uploaded: {doc_path.name}")

    # Attach files to the vector store
    batch = client.vector_stores.file_batches.create(
        vector_store_id=vector_store.id,
        file_ids=file_ids,
    )

    # Wait for processing to complete
    while batch.status == "in_progress":
        time.sleep(2)
        batch = client.vector_stores.file_batches.retrieve(
            vector_store_id=vector_store.id,
            batch_id=batch.id,
        )
        print(f"  Processing: {batch.file_counts.completed}/{batch.file_counts.total}")

    print(f"Vector store ready: {vector_store.id}")
    return vector_store.id


if __name__ == "__main__":
    store_id = create_documentation_store(
        docs_dir="./documentation",
        store_name="product-docs-v1",
    )
    print(f"\nStore ID to use in agent config: {store_id}")

Building the RAG Agent

With the vector store created, we configure a chat agent that uses FileSearchTool to search documents before answering questions.

# agents/docs_agent.py
from agents import Agent
from agents.tool import FileSearchTool

# Use the vector store ID from setup
VECTOR_STORE_ID = "vs_abc123"  # replace with your actual ID

docs_agent = Agent(
    name="docs_agent",
    model="gpt-4o",
    instructions="""You are a documentation assistant for Acme Platform.

Your job is to answer user questions accurately based on the official documentation.

Rules:
- ALWAYS search the documentation before answering technical questions
- Cite specific sections when referencing documentation
- If the documentation does not cover the user's question, say so clearly
- Do not fabricate features, endpoints, or configuration options
- For ambiguous questions, ask for clarification before searching
- When multiple documents are relevant, synthesize information from all of them
- Include code examples from the docs when they are relevant to the question""",
    tools=[
        FileSearchTool(
            vector_store_ids=[VECTOR_STORE_ID],
            max_num_results=5,
        ),
    ],
)

The max_num_results parameter controls how many document chunks are retrieved per search. Five is a good default — enough to cover the topic but not so many that irrelevant results dilute the context.

Citation Handling

When the agent retrieves information from documents, the response often includes citation markers. Proper citation handling is critical for user trust — users need to verify that the agent's answers come from real documentation.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

# citation_handler.py
import re


def extract_citations(response_text: str, annotations: list) -> dict:
    """Extract and format citations from an agent response."""
    citations = {}

    for annotation in annotations:
        if hasattr(annotation, "file_citation"):
            cite = annotation.file_citation
            citation_key = annotation.text  # e.g., "【4:0†source】"
            citations[citation_key] = {
                "file_id": cite.file_id,
                "quote": getattr(cite, "quote", ""),
            }

    return citations


def format_response_with_citations(
    response_text: str,
    citations: dict,
    file_names: dict,  # file_id -> filename mapping
) -> str:
    """Replace citation markers with readable footnotes."""
    footnotes = []
    counter = 1

    for marker, cite_info in citations.items():
        file_id = cite_info["file_id"]
        filename = file_names.get(file_id, "unknown document")
        response_text = response_text.replace(marker, f"[{counter}]")
        footnotes.append(f"[{counter}] {filename}")
        counter += 1

    if footnotes:
        response_text += "\n\n---\n**Sources:**\n" + "\n".join(footnotes)

    return response_text

FastAPI Integration with Citations

The API endpoint processes the agent's response and extracts citations for the frontend to display.

# main.py
from fastapi import FastAPI
from pydantic import BaseModel
from agents import Runner

from agents.docs_agent import docs_agent
from session_manager import SessionManager
from citation_handler import extract_citations, format_response_with_citations

app = FastAPI()
sessions = SessionManager()


class ChatRequest(BaseModel):
    session_id: str
    message: str


class Citation(BaseModel):
    index: int
    filename: str
    quote: str


class ChatResponse(BaseModel):
    response: str
    citations: list[Citation]


@app.post("/docs/chat", response_model=ChatResponse)
async def docs_chat(request: ChatRequest):
    session = sessions.get_or_create(request.session_id)
    session.add_message("user", request.message)

    result = await Runner.run(
        docs_agent,
        input=session.to_input_list(),
    )

    session.result = result
    raw_output = result.final_output

    # Extract citations from response annotations
    annotations = []
    for item in result.new_items:
        if hasattr(item, "annotations"):
            annotations.extend(item.annotations)

    citation_map = extract_citations(raw_output, annotations)

    # Build file name mapping (in production, cache this)
    import openai
    client = openai.OpenAI()
    file_names = {}
    for cite_info in citation_map.values():
        fid = cite_info["file_id"]
        if fid not in file_names:
            try:
                f = client.files.retrieve(fid)
                file_names[fid] = f.filename
            except Exception:
                file_names[fid] = "unknown"

    formatted = format_response_with_citations(
        raw_output, citation_map, file_names
    )

    citations_list = []
    for i, (marker, info) in enumerate(citation_map.items(), 1):
        citations_list.append(Citation(
            index=i,
            filename=file_names.get(info["file_id"], "unknown"),
            quote=info.get("quote", ""),
        ))

    session.add_message("assistant", formatted)

    return ChatResponse(response=formatted, citations=citations_list)

Keeping the Vector Store Fresh

Documentation changes over time. A stale vector store produces outdated answers that erode user trust. Implement a refresh pipeline that syncs your documentation source with the vector store.

# refresh_vector_store.py
import openai
from pathlib import Path

client = openai.OpenAI()


def refresh_store(vector_store_id: str, docs_dir: str):
    """Refresh vector store by removing old files and uploading new ones."""

    # List existing files in the store
    existing = client.vector_stores.files.list(
        vector_store_id=vector_store_id
    )
    existing_ids = [f.id for f in existing.data]

    # Remove all existing files
    for fid in existing_ids:
        client.vector_stores.files.delete(
            vector_store_id=vector_store_id,
            file_id=fid,
        )

    # Upload fresh documents
    doc_files = []
    for ext in ["*.md", "*.txt", "*.pdf"]:
        doc_files.extend(Path(docs_dir).glob(f"**/{ext}"))

    new_ids = []
    for doc_path in doc_files:
        with open(doc_path, "rb") as f:
            uploaded = client.files.create(file=f, purpose="assistants")
            new_ids.append(uploaded.id)

    # Attach new files
    client.vector_stores.file_batches.create(
        vector_store_id=vector_store_id,
        file_ids=new_ids,
    )

    print(f"Refreshed store with {len(new_ids)} files")

Run this as a scheduled job (cron, GitHub Action, or CI/CD step) whenever your documentation repository is updated. For high-velocity documentation, trigger it on every merge to the docs branch.

Best Practices for RAG Chat Agents

Chunk size matters. OpenAI's default chunking works well for most documentation, but if your documents have very long code blocks or tables, consider splitting them into smaller files before upload. Each chunk should be self-contained enough to answer a question on its own.

Prompt the agent to search first. Without explicit instructions, the model may attempt to answer from its training data instead of searching. The instruction "ALWAYS search the documentation before answering technical questions" forces the agent to use FileSearch on every query.

Handle "not found" gracefully. When the vector store returns no relevant results, the agent should say so rather than guessing. The instruction "If the documentation does not cover the user's question, say so clearly" prevents hallucinated answers.

Monitor retrieval quality. Log which queries return zero results or low-relevance results. These are gaps in your documentation that should be filled, or indicators that the user's vocabulary does not match your documentation's terminology.

RAG-powered chat agents combine the natural language fluency of large language models with the factual grounding of your actual documentation. FileSearchTool makes the retrieval layer trivial to set up, letting you focus on the agent's instructions, citation handling, and user experience.

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.