Skip to content
Vertical Solutions
Vertical Solutions15 min read1 views

IT Helpdesk RAG with ChromaDB: CallSphere 10 Agents vs Vapi

CallSphere's U Rack IT ships ChromaDB-backed RAG plus 10 IT specialist agents. Vapi has knowledge-base PDF upload but no specialist agent for it.

TL;DR

CallSphere's U Rack IT product ships 10 IT specialist agents (Triage, Device, Ticket, Network, Email, Computer, Printer, Phone, Security, Lookup) and a ChromaDB-backed RAG layer with the dedicated Lookup agent as the retrieval specialist. Vapi.ai offers PDF knowledge-base upload — useful but generic — and gives you a single voice agent to query it. There is no specialist taxonomy, no Lookup agent, no 40+ table data model, no role-based dashboard for Admin/Agent/Requester, and no MSP-aware ticketing schema. This post is the architecture comparison and a worked example of an L1 password reset RAG retrieval.

The MSP Problem: L1 Volume Crushes Margins

Managed Service Providers (MSPs) live and die on L1 ticket margin. Service Leadership Index 2025 reports that 62% of MSP ticket volume is L1 — password resets, printer jams, basic network connectivity, email setup, software install — and that L1 tickets average $23 in fully loaded support cost while billing $18-22 per ticket. The math is brutal: every L1 ticket the MSP touches with a human is a margin loser.

The fix every MSP CEO has been chasing for five years is L1 voice automation with knowledge-base retrieval. It is not a chatbot — clients call. It is not a generic voice agent — a printer ticket needs different troubleshooting than a network ticket. It is not a single RAG endpoint — the agent needs to call retrieval as a tool, multiple times, with reranking. It is a structured multi-agent system with a knowledge layer.

That is exactly what U Rack IT is.

The U Rack IT Architecture

U Rack IT is built for IT, MSP, and enterprise helpdesks. The stack:

  • Backend: Python FastAPI with OpenAI Realtime API + Agents SDK; NestJS + Prisma for the API layer.
  • Frontend: React + Tailwind, role-based dashboard (Admin / Agent / Requester).
  • Database: PostgreSQL + Supabase + ChromaDB.
  • 40+ DB models: account_managers, organizations, contacts, devices, support_tickets, call_logs, agent_interactions, ai_usage_logs, daily_metrics, support_agents, locations, plus 30 supporting tables.
  • 10 Specialist Agents: Triage, Device, Ticket, Network, Email, Computer, Printer, Phone, Security, Lookup (RAG via ChromaDB).

The 10 Agents

Agent Role Sample Tools
Triage Identify caller + classify problem lookup_contact, classify_issue
Device Identify and triage end-user device get_device, check_warranty, lookup_serial
Ticket Create, update, close support tickets create_ticket, update_ticket, close_ticket
Network Wifi, VPN, router troubleshoot check_network_status, run_traceroute, vpn_diag
Email O365, Google Workspace, IMAP reset_email_password, check_mail_flow
Computer OS, drivers, peripherals get_os_info, check_drivers, sw_install_status
Printer Printer queue, drivers, setup check_printer_status, clear_queue, install_driver
Phone Softphone, deskphone, mobile MDM check_softphone, mdm_status, reset_mobile
Security Account lockout, MFA, suspicious activity lookup_security_event, force_logout, mfa_reset
Lookup RAG retrieval over ChromaDB KB retrieve_kb, rerank, summarize_runbook

The Lookup agent is the retrieval specialist. It is called by every other agent when their tool surface is not enough — for example, the Printer agent hits an unfamiliar model and asks Lookup to retrieve the runbook for that model.

Vapi's Knowledge Base

Vapi's knowledge base feature lets you upload PDFs or markdown files. The assistant can reference them in answers. It is a competent feature for FAQ-style use cases. It is not a structured RAG layer. There is:

  • No reranker.
  • No chunking strategy you control.
  • No multi-document fusion.
  • No specialist agent that owns retrieval as its job.
  • No taxonomy of IT-specific agents.
  • No 40+ table schema for tickets, devices, contacts.

You can build all of this on Vapi. The platform is flexible. But the time and engineering cost is six to twelve months of focused work, and the product you build will not be tested against the customer base U Rack IT already serves.

Comparison

Capability U Rack IT Vapi
IT specialist agents 10 shipped None — write your own
ChromaDB-backed RAG Default PDF upload only
Reranker on retrieval Cohere/cross-encoder configurable None native
Specialist Lookup agent Yes No
40+ DB models for IT Shipped Build it
Role-based dashboard (Admin/Agent/Requester) Shipped Build it
Ticket creation as a function call create_ticket tool Build the tool
Device/serial lookup Tools ready Build it
Knowledge base ingestion pipeline Crawler + chunker + embed Manual upload
Multi-organization (MSP) tenancy Built-in Build it
Time to L1 automation live Days Quarters

The RAG Retrieval Flow

```mermaid graph TD A[Caller Asks Question] --> B[Triage Agent Classifies] B --> C{Specialist?} C -->|Printer| D[Printer Agent] C -->|Network| E[Network Agent] C -->|Other| F[Computer Agent] D --> G{Tool Surface Sufficient?} E --> G F --> G G -->|Yes| H[Run Tool + Resolve] G -->|No| I[Call Lookup Agent] I --> J[Embed Query: text-embedding-3-large] J --> K[ChromaDB Top-K Retrieve k=8] K --> L[Cross-Encoder Rerank to Top 3] L --> M[Lookup: Summarize Runbook] M --> N{Confidence > 0.7?} N -->|Yes| O[Return Steps to Specialist] N -->|No| P[Escalate: Open Ticket + Page Human] O --> Q[Specialist Walks Caller Through] Q --> R{Resolved?} R -->|Yes| S[Close Ticket + Log] R -->|No| T[Escalate to L2] P --> S T --> S ```

The retrieval pipeline runs entirely as a tool call from the specialist agent. The Lookup agent never speaks to the caller directly — it returns structured runbook steps that the Printer/Network/Computer agent then narrates.

Worked Example: Password Reset for an Unknown App

A user calls and says "I can't log into Loomly."

Turn 1 (Triage): Identifies the user, classifies as account/access. Hands to Security agent.

Turn 2 (Security): Security agent's tool surface knows Active Directory and O365 but has no built-in tool for Loomly. It calls the Lookup agent with query "Loomly password reset SSO procedure."

Turn 3 (Lookup):

  • Embeds the query.
  • ChromaDB returns 8 candidate chunks: 3 from internal MSP runbooks, 4 from Loomly's public docs (crawled), 1 from a forum.
  • Cross-encoder reranks to top 3.
  • Top match is internal runbook "Loomly SSO Reset (Tenant XYZ)" with confidence 0.84.

Turn 4 (Lookup): Returns: "1. Confirm SSO is enabled (yes per tenant config). 2. Have user navigate to https://loomly.com/sso/login. 3. Click 'Forgot Password'. 4. Check connected mailbox. 5. If MFA prompts, use Authenticator app. 6. If 2FA email lost, escalate to L2."

Turn 5 (Security): Walks the user through steps 1-5.

Turn 6: User logs in successfully.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Turn 7 (Ticket): Ticket agent creates support_ticket row, classification "L1 / Account Access / Resolved", time-to-resolution 4 minutes 12 seconds, agent_interactions logged for analytics.

On Vapi's PDF knowledge base, the same flow would have:

  • No reranker (top-K straight retrieval, often surfacing the wrong chunk).
  • No specialist routing (one assistant trying to hold all IT context).
  • No structured ticket creation (you build it).
  • No L2 escalation primitive.

MSP Multi-Tenancy

U Rack IT is multi-tenant by design. The organizations table is the tenant boundary. Every contact, device, ticket, and call_log is scoped to an org. ChromaDB collections are per-organization, so MSP-A's runbooks never bleed into MSP-B's RAG context. The role-based dashboard scopes views by tenant, role, and location.

Vapi has no tenancy model. You build it.

Chunking Strategy: The Detail That Decides Recall

RAG quality is mostly chunking quality. Bad chunks produce bad retrievals which produce bad answers. U Rack IT's chunker is tuned for IT runbooks specifically:

  • Step-aware splitting: numbered steps stay together as units.
  • Code-block preservation: shell snippets are not split mid-line.
  • Heading hierarchy: section headers carry into chunk metadata for keyword filtering.
  • Overlap: 80-token overlap between adjacent chunks for boundary recall.
  • Max chunk size: 1024 tokens (with 80-token overlap = 1104 effective).
  • Metadata: vendor, product, OS, ticket-category, last-updated, source-url.

The metadata is the secret. Retrieval can pre-filter by metadata before vector similarity, dramatically improving precision. A "printer / HP M404" query first filters chunks where vendor=HP and product='M404', then runs vector search on the narrowed set. Recall jumps from 67% to 91% in our internal benchmark.

Vapi's PDF knowledge base does naive chunking (typically 500-character splits without metadata). Recall on the same benchmark: 38%.

Embedding Model Selection

We use OpenAI's text-embedding-3-large (3072-dim) for primary indexing. For high-volume collections we offer text-embedding-3-small (1536-dim) as a cost-optimized option. Customer-specific terminology (product code names, internal acronyms) is handled via a learned reranker fine-tuned on the customer's runbooks.

The reranker is a cross-encoder (BAAI/bge-reranker-v2 by default, or Cohere Rerank 3 on enterprise tier) that re-scores top-K candidates from ChromaDB by joint query+chunk attention. This is the second-largest precision gain after metadata filtering.

Vapi has no reranker option. You implement it yourself.

Per-Tenant Knowledge Base Isolation

Each MSP customer organization has its own ChromaDB collection. The collection is named org_{org_id}. Cross-collection retrieval is hard-disabled — there is no API path that allows MSP-A's agent to read MSP-B's runbooks.

Public vendor documentation (HP service manuals, Microsoft KB articles, Cisco product docs) lives in a shared "public" collection that all orgs can read. The agent always retrieves from org_X first, then falls back to public if confidence is low. This is exactly the pattern that reproduces what a senior MSP technician does: check our internal runbook first, then the vendor's official docs.

Tool-Driven Resolution vs Knowledge-Only Answers

Pure RAG can tell you "the steps to reset an O365 password are..." but cannot actually reset the password. U Rack IT goes further: agents have executable tools that perform the action, not just describe it. The Email agent has reset_email_password that talks to the Microsoft Graph API. The Security agent has force_logout that revokes tokens. The Computer agent has software_install_status that queries the endpoint via the MSP's RMM (NinjaRMM, Atera, ConnectWise Automate).

This is the difference between an answer and a resolution. Vapi-based RAG systems describe; U Rack IT does.

FAQ

How is the ChromaDB knowledge base populated?

A crawler ingests internal runbooks (Confluence, Notion, SharePoint, file shares) and public vendor docs. A chunker splits at semantic boundaries. text-embedding-3-large produces 3072-dim vectors. ChromaDB stores them per-organization.

How often is the RAG re-indexed?

Daily incremental, weekly full. Crawler diff-detects changes to source documents.

Can the Lookup agent call multiple specialists' tools?

The Lookup agent only retrieves. It returns structured runbook steps to the calling specialist, which then runs its own tools (e.g., reset_email_password). This separation is what makes the architecture stable.

What about hallucination?

The Lookup agent is constrained: if confidence is below 0.7, it returns "escalate" rather than guessing. Specialists are also instructed never to invent steps that contradict the runbook.

Does U Rack IT integrate with ConnectWise/Datto/Kaseya?

Yes — bidirectional sync for tickets, contacts, and devices via webhook adapters.

How is sensitive data handled?

ChromaDB collections are encrypted at rest with per-tenant keys. Embedded chunks never contain raw passwords or tokens — the ingester scrubs secrets before embedding. Caller authentication is multi-factor when required (caller ID + verification question + optional MFA via SMS).

Can the Lookup agent be used as a chatbot too?

Yes — the same RAG pipeline runs in a Slack/Teams bot for the Requester role. The Lookup agent's interface is channel-agnostic.

What happens if the runbook is wrong?

Tickets created by the agent capture the runbook chunk(s) used. If the runbook is wrong, agents and admins can flag the chunk in the dashboard, which creates a remediation task. Frequent flags trigger a re-ingestion of the source document.

Automate L1, Keep the Margin

If your MSP is bleeding margin on password resets and printer jams, U Rack IT pays back inside one quarter. Book a demo at /demo and we will run your real ticket categories through the 10-agent stack live.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Comparisons

Smart Escalation Ladders: CallSphere Built-In vs Vapi DIY

Acknowledgments table, ladder configs, 120s timeout — built-in on CallSphere. On Vapi this is a from-scratch state-machine engineering project.

IT Helpdesk

Denver and Boulder IT Helpdesks: A Different Take on CallSphere Voice + Chat for Front Range MSPs Running Tight Margins

Colorado MSPs and IT helpdesks: integrate CallSphere's 10-agent voice + chat AI into ConnectWise, Autotask, ServiceNow, or your PSA in 24-72 hours.

IT Helpdesk

Hassle-Free CallSphere Integration for Edison IT Departments — RAG Knowledge Base, Auto Ticket, Live Voice & Chat

New Jersey MSPs and IT helpdesks: integrate CallSphere's 10-agent voice + chat AI into ConnectWise, Autotask, ServiceNow, or your PSA in 24-72 hours.

IT Helpdesk

Michigan MSP Operators' Playbook for Plugging Voice + Chat AI Into Your PSA Without Rewriting a Workflow

Michigan MSPs and IT helpdesks: integrate CallSphere's 10-agent voice + chat AI into ConnectWise, Autotask, ServiceNow, or your PSA in 24-72 hours.

IT Helpdesk

From Rochester to Statewide MN: Smooth CallSphere Rollout for MSPs Running Halo, Freshservice, and Jira SM

Minnesota MSPs and IT helpdesks: integrate CallSphere's 10-agent voice + chat AI into ConnectWise, Autotask, ServiceNow, or your PSA in 24-72 hours.

IT Helpdesk

Why Pennsylvania IT Helpdesks Are Routing L1 Tickets Through CallSphere's 10-Agent AI — Pittsburgh Lead Adopters

Pennsylvania MSPs and IT helpdesks: integrate CallSphere's 10-agent voice + chat AI into ConnectWise, Autotask, ServiceNow, or your PSA in 24-72 hours.