IT Helpdesk RAG with ChromaDB: CallSphere 10 Agents vs Vapi
CallSphere's U Rack IT ships ChromaDB-backed RAG plus 10 IT specialist agents. Vapi has knowledge-base PDF upload but no specialist agent for it.
TL;DR
CallSphere's U Rack IT product ships 10 IT specialist agents (Triage, Device, Ticket, Network, Email, Computer, Printer, Phone, Security, Lookup) and a ChromaDB-backed RAG layer with the dedicated Lookup agent as the retrieval specialist. Vapi.ai offers PDF knowledge-base upload — useful but generic — and gives you a single voice agent to query it. There is no specialist taxonomy, no Lookup agent, no 40+ table data model, no role-based dashboard for Admin/Agent/Requester, and no MSP-aware ticketing schema. This post is the architecture comparison and a worked example of an L1 password reset RAG retrieval.
The MSP Problem: L1 Volume Crushes Margins
Managed Service Providers (MSPs) live and die on L1 ticket margin. Service Leadership Index 2025 reports that 62% of MSP ticket volume is L1 — password resets, printer jams, basic network connectivity, email setup, software install — and that L1 tickets average $23 in fully loaded support cost while billing $18-22 per ticket. The math is brutal: every L1 ticket the MSP touches with a human is a margin loser.
The fix every MSP CEO has been chasing for five years is L1 voice automation with knowledge-base retrieval. It is not a chatbot — clients call. It is not a generic voice agent — a printer ticket needs different troubleshooting than a network ticket. It is not a single RAG endpoint — the agent needs to call retrieval as a tool, multiple times, with reranking. It is a structured multi-agent system with a knowledge layer.
That is exactly what U Rack IT is.
The U Rack IT Architecture
U Rack IT is built for IT, MSP, and enterprise helpdesks. The stack:
- Backend: Python FastAPI with OpenAI Realtime API + Agents SDK; NestJS + Prisma for the API layer.
- Frontend: React + Tailwind, role-based dashboard (Admin / Agent / Requester).
- Database: PostgreSQL + Supabase + ChromaDB.
- 40+ DB models: account_managers, organizations, contacts, devices, support_tickets, call_logs, agent_interactions, ai_usage_logs, daily_metrics, support_agents, locations, plus 30 supporting tables.
- 10 Specialist Agents: Triage, Device, Ticket, Network, Email, Computer, Printer, Phone, Security, Lookup (RAG via ChromaDB).
The 10 Agents
| Agent | Role | Sample Tools |
|---|---|---|
| Triage | Identify caller + classify problem | lookup_contact, classify_issue |
| Device | Identify and triage end-user device | get_device, check_warranty, lookup_serial |
| Ticket | Create, update, close support tickets | create_ticket, update_ticket, close_ticket |
| Network | Wifi, VPN, router troubleshoot | check_network_status, run_traceroute, vpn_diag |
| O365, Google Workspace, IMAP | reset_email_password, check_mail_flow | |
| Computer | OS, drivers, peripherals | get_os_info, check_drivers, sw_install_status |
| Printer | Printer queue, drivers, setup | check_printer_status, clear_queue, install_driver |
| Phone | Softphone, deskphone, mobile MDM | check_softphone, mdm_status, reset_mobile |
| Security | Account lockout, MFA, suspicious activity | lookup_security_event, force_logout, mfa_reset |
| Lookup | RAG retrieval over ChromaDB KB | retrieve_kb, rerank, summarize_runbook |
The Lookup agent is the retrieval specialist. It is called by every other agent when their tool surface is not enough — for example, the Printer agent hits an unfamiliar model and asks Lookup to retrieve the runbook for that model.
Vapi's Knowledge Base
Vapi's knowledge base feature lets you upload PDFs or markdown files. The assistant can reference them in answers. It is a competent feature for FAQ-style use cases. It is not a structured RAG layer. There is:
- No reranker.
- No chunking strategy you control.
- No multi-document fusion.
- No specialist agent that owns retrieval as its job.
- No taxonomy of IT-specific agents.
- No 40+ table schema for tickets, devices, contacts.
You can build all of this on Vapi. The platform is flexible. But the time and engineering cost is six to twelve months of focused work, and the product you build will not be tested against the customer base U Rack IT already serves.
Comparison
| Capability | U Rack IT | Vapi |
|---|---|---|
| IT specialist agents | 10 shipped | None — write your own |
| ChromaDB-backed RAG | Default | PDF upload only |
| Reranker on retrieval | Cohere/cross-encoder configurable | None native |
| Specialist Lookup agent | Yes | No |
| 40+ DB models for IT | Shipped | Build it |
| Role-based dashboard (Admin/Agent/Requester) | Shipped | Build it |
| Ticket creation as a function call | create_ticket tool | Build the tool |
| Device/serial lookup | Tools ready | Build it |
| Knowledge base ingestion pipeline | Crawler + chunker + embed | Manual upload |
| Multi-organization (MSP) tenancy | Built-in | Build it |
| Time to L1 automation live | Days | Quarters |
The RAG Retrieval Flow
```mermaid graph TD A[Caller Asks Question] --> B[Triage Agent Classifies] B --> C{Specialist?} C -->|Printer| D[Printer Agent] C -->|Network| E[Network Agent] C -->|Other| F[Computer Agent] D --> G{Tool Surface Sufficient?} E --> G F --> G G -->|Yes| H[Run Tool + Resolve] G -->|No| I[Call Lookup Agent] I --> J[Embed Query: text-embedding-3-large] J --> K[ChromaDB Top-K Retrieve k=8] K --> L[Cross-Encoder Rerank to Top 3] L --> M[Lookup: Summarize Runbook] M --> N{Confidence > 0.7?} N -->|Yes| O[Return Steps to Specialist] N -->|No| P[Escalate: Open Ticket + Page Human] O --> Q[Specialist Walks Caller Through] Q --> R{Resolved?} R -->|Yes| S[Close Ticket + Log] R -->|No| T[Escalate to L2] P --> S T --> S ```
The retrieval pipeline runs entirely as a tool call from the specialist agent. The Lookup agent never speaks to the caller directly — it returns structured runbook steps that the Printer/Network/Computer agent then narrates.
Worked Example: Password Reset for an Unknown App
A user calls and says "I can't log into Loomly."
Turn 1 (Triage): Identifies the user, classifies as account/access. Hands to Security agent.
Turn 2 (Security): Security agent's tool surface knows Active Directory and O365 but has no built-in tool for Loomly. It calls the Lookup agent with query "Loomly password reset SSO procedure."
Turn 3 (Lookup):
- Embeds the query.
- ChromaDB returns 8 candidate chunks: 3 from internal MSP runbooks, 4 from Loomly's public docs (crawled), 1 from a forum.
- Cross-encoder reranks to top 3.
- Top match is internal runbook "Loomly SSO Reset (Tenant XYZ)" with confidence 0.84.
Turn 4 (Lookup): Returns: "1. Confirm SSO is enabled (yes per tenant config). 2. Have user navigate to https://loomly.com/sso/login. 3. Click 'Forgot Password'. 4. Check connected mailbox. 5. If MFA prompts, use Authenticator app. 6. If 2FA email lost, escalate to L2."
Turn 5 (Security): Walks the user through steps 1-5.
Turn 6: User logs in successfully.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Turn 7 (Ticket): Ticket agent creates support_ticket row, classification "L1 / Account Access / Resolved", time-to-resolution 4 minutes 12 seconds, agent_interactions logged for analytics.
On Vapi's PDF knowledge base, the same flow would have:
- No reranker (top-K straight retrieval, often surfacing the wrong chunk).
- No specialist routing (one assistant trying to hold all IT context).
- No structured ticket creation (you build it).
- No L2 escalation primitive.
MSP Multi-Tenancy
U Rack IT is multi-tenant by design. The organizations table is the tenant boundary. Every contact, device, ticket, and call_log is scoped to an org. ChromaDB collections are per-organization, so MSP-A's runbooks never bleed into MSP-B's RAG context. The role-based dashboard scopes views by tenant, role, and location.
Vapi has no tenancy model. You build it.
Chunking Strategy: The Detail That Decides Recall
RAG quality is mostly chunking quality. Bad chunks produce bad retrievals which produce bad answers. U Rack IT's chunker is tuned for IT runbooks specifically:
- Step-aware splitting: numbered steps stay together as units.
- Code-block preservation: shell snippets are not split mid-line.
- Heading hierarchy: section headers carry into chunk metadata for keyword filtering.
- Overlap: 80-token overlap between adjacent chunks for boundary recall.
- Max chunk size: 1024 tokens (with 80-token overlap = 1104 effective).
- Metadata: vendor, product, OS, ticket-category, last-updated, source-url.
The metadata is the secret. Retrieval can pre-filter by metadata before vector similarity, dramatically improving precision. A "printer / HP M404" query first filters chunks where vendor=HP and product='M404', then runs vector search on the narrowed set. Recall jumps from 67% to 91% in our internal benchmark.
Vapi's PDF knowledge base does naive chunking (typically 500-character splits without metadata). Recall on the same benchmark: 38%.
Embedding Model Selection
We use OpenAI's text-embedding-3-large (3072-dim) for primary indexing. For high-volume collections we offer text-embedding-3-small (1536-dim) as a cost-optimized option. Customer-specific terminology (product code names, internal acronyms) is handled via a learned reranker fine-tuned on the customer's runbooks.
The reranker is a cross-encoder (BAAI/bge-reranker-v2 by default, or Cohere Rerank 3 on enterprise tier) that re-scores top-K candidates from ChromaDB by joint query+chunk attention. This is the second-largest precision gain after metadata filtering.
Vapi has no reranker option. You implement it yourself.
Per-Tenant Knowledge Base Isolation
Each MSP customer organization has its own ChromaDB collection. The collection is named org_{org_id}. Cross-collection retrieval is hard-disabled — there is no API path that allows MSP-A's agent to read MSP-B's runbooks.
Public vendor documentation (HP service manuals, Microsoft KB articles, Cisco product docs) lives in a shared "public" collection that all orgs can read. The agent always retrieves from org_X first, then falls back to public if confidence is low. This is exactly the pattern that reproduces what a senior MSP technician does: check our internal runbook first, then the vendor's official docs.
Tool-Driven Resolution vs Knowledge-Only Answers
Pure RAG can tell you "the steps to reset an O365 password are..." but cannot actually reset the password. U Rack IT goes further: agents have executable tools that perform the action, not just describe it. The Email agent has reset_email_password that talks to the Microsoft Graph API. The Security agent has force_logout that revokes tokens. The Computer agent has software_install_status that queries the endpoint via the MSP's RMM (NinjaRMM, Atera, ConnectWise Automate).
This is the difference between an answer and a resolution. Vapi-based RAG systems describe; U Rack IT does.
FAQ
How is the ChromaDB knowledge base populated?
A crawler ingests internal runbooks (Confluence, Notion, SharePoint, file shares) and public vendor docs. A chunker splits at semantic boundaries. text-embedding-3-large produces 3072-dim vectors. ChromaDB stores them per-organization.
How often is the RAG re-indexed?
Daily incremental, weekly full. Crawler diff-detects changes to source documents.
Can the Lookup agent call multiple specialists' tools?
The Lookup agent only retrieves. It returns structured runbook steps to the calling specialist, which then runs its own tools (e.g., reset_email_password). This separation is what makes the architecture stable.
What about hallucination?
The Lookup agent is constrained: if confidence is below 0.7, it returns "escalate" rather than guessing. Specialists are also instructed never to invent steps that contradict the runbook.
Does U Rack IT integrate with ConnectWise/Datto/Kaseya?
Yes — bidirectional sync for tickets, contacts, and devices via webhook adapters.
How is sensitive data handled?
ChromaDB collections are encrypted at rest with per-tenant keys. Embedded chunks never contain raw passwords or tokens — the ingester scrubs secrets before embedding. Caller authentication is multi-factor when required (caller ID + verification question + optional MFA via SMS).
Can the Lookup agent be used as a chatbot too?
Yes — the same RAG pipeline runs in a Slack/Teams bot for the Requester role. The Lookup agent's interface is channel-agnostic.
What happens if the runbook is wrong?
Tickets created by the agent capture the runbook chunk(s) used. If the runbook is wrong, agents and admins can flag the chunk in the dashboard, which creates a remediation task. Frequent flags trigger a re-ingestion of the source document.
Automate L1, Keep the Margin
If your MSP is bleeding margin on password resets and printer jams, U Rack IT pays back inside one quarter. Book a demo at /demo and we will run your real ticket categories through the 10-agent stack live.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.