Building a RAG-Powered Chat Agent with FileSearch
Build a documentation chatbot using the OpenAI Agents SDK FileSearchTool with vector stores, citation handling, and hybrid retrieval for production-grade RAG chat agents.
Why RAG Beats Pure Prompting
Large language models have broad knowledge but shallow depth on your specific domain. When a user asks about your product's pricing tiers, deployment requirements, or API rate limits, the model either hallucinates an answer or admits it does not know. Retrieval-Augmented Generation (RAG) solves this by searching your actual documents at query time and injecting relevant passages into the model's context.
The OpenAI Agents SDK includes FileSearchTool, which integrates with OpenAI's hosted vector stores to provide turnkey RAG. You upload documents, the platform chunks and embeds them, and the agent automatically searches them when answering questions. This guide walks through building a production documentation chatbot using FileSearch.
Vector Store Setup
Before the agent can search documents, we need to create a vector store and upload files. OpenAI handles chunking, embedding, and indexing automatically.
flowchart TD
START["Building a RAG-Powered Chat Agent with FileSearch"] --> A
A["Why RAG Beats Pure Prompting"]
A --> B
B["Vector Store Setup"]
B --> C
C["Building the RAG Agent"]
C --> D
D["Citation Handling"]
D --> E
E["FastAPI Integration with Citations"]
E --> F
F["Keeping the Vector Store Fresh"]
F --> G
G["Best Practices for RAG Chat Agents"]
G --> DONE["Key Takeaways"]
style START fill:#4f46e5,stroke:#4338ca,color:#fff
style DONE fill:#059669,stroke:#047857,color:#fff
# setup_vector_store.py
import openai
import time
from pathlib import Path
client = openai.OpenAI()
def create_documentation_store(docs_dir: str, store_name: str) -> str:
"""Create a vector store and upload all documentation files."""
# Create the vector store
vector_store = client.vector_stores.create(
name=store_name,
expires_after={"anchor": "last_active_at", "days": 30},
)
print(f"Created vector store: {vector_store.id}")
# Collect all documentation files
doc_files = []
for ext in ["*.md", "*.txt", "*.pdf", "*.html"]:
doc_files.extend(Path(docs_dir).glob(f"**/{ext}"))
if not doc_files:
raise ValueError(f"No documentation files found in {docs_dir}")
print(f"Found {len(doc_files)} documentation files")
# Upload files in batches
file_ids = []
for doc_path in doc_files:
with open(doc_path, "rb") as f:
uploaded = client.files.create(file=f, purpose="assistants")
file_ids.append(uploaded.id)
print(f" Uploaded: {doc_path.name}")
# Attach files to the vector store
batch = client.vector_stores.file_batches.create(
vector_store_id=vector_store.id,
file_ids=file_ids,
)
# Wait for processing to complete
while batch.status == "in_progress":
time.sleep(2)
batch = client.vector_stores.file_batches.retrieve(
vector_store_id=vector_store.id,
batch_id=batch.id,
)
print(f" Processing: {batch.file_counts.completed}/{batch.file_counts.total}")
print(f"Vector store ready: {vector_store.id}")
return vector_store.id
if __name__ == "__main__":
store_id = create_documentation_store(
docs_dir="./documentation",
store_name="product-docs-v1",
)
print(f"\nStore ID to use in agent config: {store_id}")
Building the RAG Agent
With the vector store created, we configure a chat agent that uses FileSearchTool to search documents before answering questions.
# agents/docs_agent.py
from agents import Agent
from agents.tool import FileSearchTool
# Use the vector store ID from setup
VECTOR_STORE_ID = "vs_abc123" # replace with your actual ID
docs_agent = Agent(
name="docs_agent",
model="gpt-4o",
instructions="""You are a documentation assistant for Acme Platform.
Your job is to answer user questions accurately based on the official documentation.
Rules:
- ALWAYS search the documentation before answering technical questions
- Cite specific sections when referencing documentation
- If the documentation does not cover the user's question, say so clearly
- Do not fabricate features, endpoints, or configuration options
- For ambiguous questions, ask for clarification before searching
- When multiple documents are relevant, synthesize information from all of them
- Include code examples from the docs when they are relevant to the question""",
tools=[
FileSearchTool(
vector_store_ids=[VECTOR_STORE_ID],
max_num_results=5,
),
],
)
The max_num_results parameter controls how many document chunks are retrieved per search. Five is a good default — enough to cover the topic but not so many that irrelevant results dilute the context.
Citation Handling
When the agent retrieves information from documents, the response often includes citation markers. Proper citation handling is critical for user trust — users need to verify that the agent's answers come from real documentation.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
# citation_handler.py
import re
def extract_citations(response_text: str, annotations: list) -> dict:
"""Extract and format citations from an agent response."""
citations = {}
for annotation in annotations:
if hasattr(annotation, "file_citation"):
cite = annotation.file_citation
citation_key = annotation.text # e.g., "【4:0†source】"
citations[citation_key] = {
"file_id": cite.file_id,
"quote": getattr(cite, "quote", ""),
}
return citations
def format_response_with_citations(
response_text: str,
citations: dict,
file_names: dict, # file_id -> filename mapping
) -> str:
"""Replace citation markers with readable footnotes."""
footnotes = []
counter = 1
for marker, cite_info in citations.items():
file_id = cite_info["file_id"]
filename = file_names.get(file_id, "unknown document")
response_text = response_text.replace(marker, f"[{counter}]")
footnotes.append(f"[{counter}] {filename}")
counter += 1
if footnotes:
response_text += "\n\n---\n**Sources:**\n" + "\n".join(footnotes)
return response_text
FastAPI Integration with Citations
The API endpoint processes the agent's response and extracts citations for the frontend to display.
# main.py
from fastapi import FastAPI
from pydantic import BaseModel
from agents import Runner
from agents.docs_agent import docs_agent
from session_manager import SessionManager
from citation_handler import extract_citations, format_response_with_citations
app = FastAPI()
sessions = SessionManager()
class ChatRequest(BaseModel):
session_id: str
message: str
class Citation(BaseModel):
index: int
filename: str
quote: str
class ChatResponse(BaseModel):
response: str
citations: list[Citation]
@app.post("/docs/chat", response_model=ChatResponse)
async def docs_chat(request: ChatRequest):
session = sessions.get_or_create(request.session_id)
session.add_message("user", request.message)
result = await Runner.run(
docs_agent,
input=session.to_input_list(),
)
session.result = result
raw_output = result.final_output
# Extract citations from response annotations
annotations = []
for item in result.new_items:
if hasattr(item, "annotations"):
annotations.extend(item.annotations)
citation_map = extract_citations(raw_output, annotations)
# Build file name mapping (in production, cache this)
import openai
client = openai.OpenAI()
file_names = {}
for cite_info in citation_map.values():
fid = cite_info["file_id"]
if fid not in file_names:
try:
f = client.files.retrieve(fid)
file_names[fid] = f.filename
except Exception:
file_names[fid] = "unknown"
formatted = format_response_with_citations(
raw_output, citation_map, file_names
)
citations_list = []
for i, (marker, info) in enumerate(citation_map.items(), 1):
citations_list.append(Citation(
index=i,
filename=file_names.get(info["file_id"], "unknown"),
quote=info.get("quote", ""),
))
session.add_message("assistant", formatted)
return ChatResponse(response=formatted, citations=citations_list)
Keeping the Vector Store Fresh
Documentation changes over time. A stale vector store produces outdated answers that erode user trust. Implement a refresh pipeline that syncs your documentation source with the vector store.
# refresh_vector_store.py
import openai
from pathlib import Path
client = openai.OpenAI()
def refresh_store(vector_store_id: str, docs_dir: str):
"""Refresh vector store by removing old files and uploading new ones."""
# List existing files in the store
existing = client.vector_stores.files.list(
vector_store_id=vector_store_id
)
existing_ids = [f.id for f in existing.data]
# Remove all existing files
for fid in existing_ids:
client.vector_stores.files.delete(
vector_store_id=vector_store_id,
file_id=fid,
)
# Upload fresh documents
doc_files = []
for ext in ["*.md", "*.txt", "*.pdf"]:
doc_files.extend(Path(docs_dir).glob(f"**/{ext}"))
new_ids = []
for doc_path in doc_files:
with open(doc_path, "rb") as f:
uploaded = client.files.create(file=f, purpose="assistants")
new_ids.append(uploaded.id)
# Attach new files
client.vector_stores.file_batches.create(
vector_store_id=vector_store_id,
file_ids=new_ids,
)
print(f"Refreshed store with {len(new_ids)} files")
Run this as a scheduled job (cron, GitHub Action, or CI/CD step) whenever your documentation repository is updated. For high-velocity documentation, trigger it on every merge to the docs branch.
Best Practices for RAG Chat Agents
Chunk size matters. OpenAI's default chunking works well for most documentation, but if your documents have very long code blocks or tables, consider splitting them into smaller files before upload. Each chunk should be self-contained enough to answer a question on its own.
Prompt the agent to search first. Without explicit instructions, the model may attempt to answer from its training data instead of searching. The instruction "ALWAYS search the documentation before answering technical questions" forces the agent to use FileSearch on every query.
Handle "not found" gracefully. When the vector store returns no relevant results, the agent should say so rather than guessing. The instruction "If the documentation does not cover the user's question, say so clearly" prevents hallucinated answers.
Monitor retrieval quality. Log which queries return zero results or low-relevance results. These are gaps in your documentation that should be filled, or indicators that the user's vocabulary does not match your documentation's terminology.
RAG-powered chat agents combine the natural language fluency of large language models with the factual grounding of your actual documentation. FileSearchTool makes the retrieval layer trivial to set up, letting you focus on the agent's instructions, citation handling, and user experience.
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.