Agent Role Cards and Team Composition: Findings From 200 Enterprise Deployments
What enterprise multi-agent systems actually look like in 2026, based on deployment patterns across 200 production teams.
The Patterns That Repeat
Across 200+ enterprise multi-agent systems documented in 2025-2026 case studies, vendor write-ups, and conference talks, the team compositions are remarkably similar. The same five or six "role cards" show up over and over.
This piece names them, describes their typical responsibilities, and walks through how teams compose them.
The Six Common Role Cards
flowchart TB
User --> Tri[Triage Agent]
Tri --> Sp1[Specialist Agent A]
Tri --> Sp2[Specialist Agent B]
Sp1 --> Tool[Tool-Caller Agent]
Sp2 --> Tool
Tool --> Verif[Verifier Agent]
Verif --> Mem[Memory Curator Agent]
Mem --> Out[Output Composer Agent]
Triage Agent
Routes the inbound request. Lightweight, fast, often a smaller model. Decides which specialist (or specialists) handles the task. Almost universal — present in 95+ percent of enterprise multi-agent systems.
Specialist Agent
Domain-focused. Has the right context, the right system prompt, and access to the relevant tools. Most systems have 2-8 specialists.
Tool-Caller Agent
A pattern that is becoming more common in 2026: a dedicated agent whose only job is to call tools cleanly. Specialists describe what they want; the tool-caller translates that into precise function calls. Reduces hallucination of tool arguments.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Verifier Agent
Checks the proposed action or response against a rubric. Cheap, fast, always-on. Catches schema errors, policy violations, factual mismatches. Often a smaller, fast model.
Memory Curator Agent
Asynchronous. Reads the run transcript and updates the long-term memory store with distilled facts, skills, and patterns. Runs on a queue, not in the user-facing loop.
Output Composer Agent
Final-pass formatter. Takes raw specialist outputs and produces the user-facing artifact (chat reply, email, document section). Owns the brand voice and formatting rules.
Composition Patterns
flowchart LR
subgraph Slim[Slim: 3 agents]
Tr1[Triage] --> Sp[Specialist] --> Out1[Composer]
end
subgraph Standard[Standard: 5 agents]
Tr2[Triage] --> Sp2[Specialist] --> Tool2[Tool-Caller] --> Verif2[Verifier] --> Out2[Composer]
end
subgraph Full[Full: 7+ agents]
Tr3[Triage] --> Multi[Multiple Specialists] --> Tool3[Tool-Caller] --> Verif3[Verifier] --> Out3[Composer]
Multi --> Mem3[Memory Curator]
end
Most enterprise systems land at "Standard" — 5 to 7 agents. "Full" patterns appear when the use case has many domains (a bank-wide assistant) or strict verification needs (healthcare, legal).
Anti-Patterns That Show Up Repeatedly
- The 12-agent monolith: every conceivable specialty as its own agent. Maintenance nightmare; specialists overlap; orchestration is brittle. Better: 4 broader specialists, each with sub-modes.
- The triage-only system: a smart triage that routes to a single big agent. Defeats the purpose of multi-agent; the big agent becomes a single point of failure.
- Verifier as afterthought: bolted on after the system shipped. Verification needs to be part of the design from day one to be useful.
- Memory as RAG: treating memory as just a vector store of past transcripts. Misses semantic and procedural memory completely.
Two Reference Compositions
Customer Service (Standard)
- Triage: route to billing, returns, technical, or general
- Specialist x4 (one per route)
- Tool-Caller: invokes order/account APIs cleanly
- Verifier: ensures policy compliance and tone
- Composer: produces the final reply
Research / Analytics (Full)
- Triage: classify the question
- Decomposer: break into sub-questions
- Specialist x3-5: one per sub-question
- Tool-Caller: handles search, database, code execution
- Verifier: fact-checks against sources
- Memory Curator: writes findings to durable memory
- Composer: produces the final report
Team Sizing Heuristics
After surveying 200+ deployments, three heuristics hold up:
- Start with the 3-agent slim composition; add roles only when a measured failure mode demands it
- Never have an agent whose only job is to forward messages; that is plumbing, not an agent
- Each specialist should own a clear failure mode if removed; if you cannot name what would break, the specialist is redundant
What the Best Teams Do Differently
The teams whose multi-agent systems run reliably for 6+ months in production share three habits:
- They write role cards explicitly. Each agent has a one-page document describing scope, inputs, outputs, tools, and refuse-conditions. New team members can understand the system in an hour.
- They version each agent independently. Agent A on v3.2 of its prompt while agent B is on v1.7 is normal; system version is the composition, not a single artifact.
- They invest in eval per role. Each agent has its own evaluation suite. System-level evals on top of role-level evals.
Sources
- Anthropic agent design patterns — https://www.anthropic.com/research
- LangGraph multi-agent recipes — https://langchain-ai.github.io/langgraph
- "Multi-agent system patterns" 2026 review — https://arxiv.org/abs/2402.01680
- AutoGen design patterns — https://microsoft.github.io/autogen
- "Production multi-agent case studies 2026" — https://thenewstack.io
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.