Text Classification for Agent Routing: Intent Detection and Topic Categorization

Why Classification Is the Agent's First Decision

Every message that reaches a multi-agent system must be routed to the right handler. A billing question should go to the billing agent. A technical issue should go to the support agent. A sales inquiry should go to the sales agent. Text classification is the mechanism that makes this routing fast and accurate.

Intent detection is a specialized form of text classification where the classes represent user goals: "check_balance," "reset_password," "schedule_appointment." Topic categorization groups messages by subject matter: "billing," "technical," "general." Both are essential for agents that handle diverse user requests.

Building an Intent Classifier with scikit-learn

For agents with a known set of intents and training data, a traditional ML classifier is fast and reliable.

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
import joblib

training_data = [
    ("What is my account balance?", "check_balance"),
    ("How much do I owe?", "check_balance"),
    ("I forgot my password", "reset_password"),
    ("Can't log in to my account", "reset_password"),
    ("I want to cancel my subscription", "cancel_subscription"),
    ("How do I unsubscribe?", "cancel_subscription"),
    ("Can I speak to a manager?", "escalate"),
    ("I need to talk to a real person", "escalate"),
]

texts, labels = zip(*training_data)

classifier = Pipeline([
    ("tfidf", TfidfVectorizer(ngram_range=(1, 2), max_features=5000)),
    ("clf", LogisticRegression(max_iter=1000)),
])

classifier.fit(texts, labels)

def classify_intent(text: str) -> dict:
    prediction = classifier.predict([text])[0]
    probabilities = classifier.predict_proba([text])[0]
    confidence = max(probabilities)
    return {"intent": prediction, "confidence": round(confidence, 3)}

print(classify_intent("I need to check how much money is in my account"))
# {'intent': 'check_balance', 'confidence': 0.82}

Zero-Shot Classification: No Training Data Required

When you cannot collect labeled training data — or when new intents appear frequently — zero-shot classification lets you define categories at inference time.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

from transformers import pipeline

zero_shot = pipeline(
    "zero-shot-classification",
    model="facebook/bart-large-mnli",
)

def classify_zero_shot(
    text: str,
    candidate_labels: list[str],
    multi_label: bool = False,
) -> dict:
    result = zero_shot(
        text,
        candidate_labels=candidate_labels,
        multi_label=multi_label,
    )
    return {
        label: round(score, 3)
        for label, score in zip(result["labels"], result["scores"])
    }

labels = ["billing", "technical_support", "sales", "general_inquiry"]
message = "My invoice shows a charge I didn't authorize"

print(classify_zero_shot(message, labels))
# {'billing': 0.891, 'general_inquiry': 0.054,
#  'technical_support': 0.033, 'sales': 0.022}

Zero-shot models use natural language inference under the hood. They evaluate whether the text "entails" each candidate label. This means your label names matter — "billing_dispute" will perform differently than "billing" for the same input.

Multi-Label Classification

Users often express multiple intents in a single message: "I need to update my address and also check when my next payment is due." Multi-label classification assigns multiple categories simultaneously.

def classify_multi_intent(text: str, labels: list[str]) -> list[dict]:
    result = zero_shot(
        text,
        candidate_labels=labels,
        multi_label=True,
    )
    return [
        {"label": label, "score": round(score, 3)}
        for label, score in zip(result["labels"], result["scores"])
        if score > 0.5
    ]

labels = ["update_info", "check_payment", "billing", "technical"]
message = "Please change my address and tell me when my next bill is due"

print(classify_multi_intent(message, labels))
# [{'label': 'update_info', 'score': 0.92},
#  {'label': 'check_payment', 'score': 0.87},
#  {'label': 'billing', 'score': 0.61}]

Confidence-Based Routing

Raw classification output needs a decision layer. Here is a routing pattern that handles low-confidence predictions gracefully.

from dataclasses import dataclass
from typing import Optional

@dataclass
class RouteDecision:
    target_agent: str
    confidence: float
    fallback_used: bool

class IntentRouter:
    def __init__(self, classifier, threshold: float = 0.7):
        self.classifier = classifier
        self.threshold = threshold
        self.agent_map = {
            "check_balance": "billing_agent",
            "reset_password": "auth_agent",
            "cancel_subscription": "retention_agent",
            "escalate": "human_agent",
        }

    def route(self, message: str) -> RouteDecision:
        result = self.classifier(message)
        intent = result["intent"]
        confidence = result["confidence"]

        if confidence >= self.threshold and intent in self.agent_map:
            return RouteDecision(
                target_agent=self.agent_map[intent],
                confidence=confidence,
                fallback_used=False,
            )

        return RouteDecision(
            target_agent="general_agent",
            confidence=confidence,
            fallback_used=True,
        )

The threshold is critical. Set it too low and misrouted messages frustrate users. Set it too high and too many messages fall through to the general agent. Start at 0.7 and adjust based on production metrics.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Combining Classification Approaches

Production agents often layer multiple classifiers. A fast keyword-based filter catches obvious cases, a traditional ML model handles most traffic, and a zero-shot model handles the long tail.

class HybridClassifier:
    def __init__(self, keyword_rules, ml_model, zero_shot_model):
        self.keyword_rules = keyword_rules
        self.ml_model = ml_model
        self.zero_shot_model = zero_shot_model

    def classify(self, text: str, labels: list[str]) -> dict:
        # Tier 1: keyword match (microseconds)
        for pattern, label in self.keyword_rules.items():
            if pattern.lower() in text.lower():
                return {"label": label, "confidence": 1.0, "tier": "keyword"}

        # Tier 2: ML model (milliseconds)
        ml_result = self.ml_model(text)
        if ml_result["confidence"] > 0.8:
            return {**ml_result, "tier": "ml"}

        # Tier 3: zero-shot (slower but flexible)
        zs_result = self.zero_shot_model(text, labels)
        top_label = max(zs_result, key=zs_result.get)
        return {
            "label": top_label,
            "confidence": zs_result[top_label],
            "tier": "zero_shot",
        }

FAQ

How many training examples do I need per intent for a reliable classifier?

For traditional ML classifiers like logistic regression with TF-IDF, aim for at least 50 to 100 examples per intent. For fine-tuned transformer models, 20 to 50 examples per class can be sufficient. If you have fewer than 20 examples, use zero-shot classification instead and gradually collect labeled data from production traffic to build a training set.

How do I handle intents that overlap semantically?

Overlapping intents are a design problem, not a model problem. If "check_balance" and "billing_inquiry" are hard to distinguish, consider merging them into a single intent and using a second-stage classifier or rule-based logic to differentiate subtypes. Alternatively, use multi-label classification and let the downstream agent handle both intents.

What is the best way to add new intents without retraining from scratch?

Use a zero-shot model as your primary classifier and maintain a mapping from zero-shot labels to agent routes. Adding a new intent is as simple as adding a new label string. Once you collect enough production examples for the new intent, fine-tune your ML model and redeploy. This gives you immediate coverage with zero-shot and improved accuracy over time with supervised learning.

#TextClassification #IntentDetection #NLP #ZeroShot #AIAgents #Python #AgenticAI #LearnAI #AIEngineering

Text Classification for Agent Routing: Intent Detection and Topic Categorization

Why Classification Is the Agent's First Decision

Building an Intent Classifier with scikit-learn

Zero-Shot Classification: No Training Data Required

Multi-Label Classification

Confidence-Based Routing

Combining Classification Approaches

FAQ

How many training examples do I need per intent for a reliable classifier?

How do I handle intents that overlap semantically?

What is the best way to add new intents without retraining from scratch?

Try CallSphere AI Voice Agents

Related Articles You May Like

Personal AI Assistant: How to Pick One for Business in 2026

Free AI Agents in 2026: When Free Wins and When It Costs You

Graphiti: How Temporal Knowledge Graphs Give AI Voice Agents Persistent Memory (2026 Guide)

Chatbot App vs ChatGPT: What's the Difference, and Which Do I Need?

Building an HVAC After-Hours Emergency Escalation System: A Complete Engineering Guide

OpenAI Frontier vs Anthropic Managed Agents: 2026 Comparison