Skip to content
The Supervisor Pattern: A Meta-Agent That Monitors and Corrects Other Agents
Learn Agentic AI11 min read19 views

The Supervisor Pattern: A Meta-Agent That Monitors and Corrects Other Agents

Build a Supervisor meta-agent that monitors worker agents, performs quality checks, triggers automatic retries, and escalates failures — ensuring reliable multi-agent system output.

The Need for Supervision in Agent Systems

When you deploy multiple AI agents that perform important tasks — generating reports, answering customer questions, writing code — the outputs are not always correct on the first attempt. Models hallucinate, misinterpret instructions, or produce incomplete results. The Supervisor pattern introduces a meta-agent whose sole job is to monitor worker agent outputs, evaluate their quality, and either approve, request corrections, or escalate to a human.

This is analogous to a team lead reviewing work before it ships. The supervisor does not do the work itself — it judges whether the work meets quality standards.

Architecture Overview

The Supervisor pattern has three components:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
flowchart TD
    INPUT(["Task input"])
    SUPER["Supervisor agent<br/>plans plus monitors"]
    W1["Worker 1<br/>research"]
    W2["Worker 2<br/>code"]
    W3["Worker 3<br/>writing"]
    CRITIC{"Output meets<br/>rubric?"}
    REWORK["Rework or<br/>retry path"]
    SHARED[("Shared scratchpad<br/>and memory")]
    OUT(["Final result"])
    INPUT --> SUPER
    SUPER --> W1 --> CRITIC
    SUPER --> W2 --> CRITIC
    SUPER --> W3 --> CRITIC
    W1 --> SHARED
    W2 --> SHARED
    W3 --> SHARED
    SHARED --> SUPER
    CRITIC -->|Pass| OUT
    CRITIC -->|Fail| REWORK --> SUPER
    style SUPER fill:#4f46e5,stroke:#4338ca,color:#fff
    style CRITIC fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OUT fill:#059669,stroke:#047857,color:#fff
    style SHARED fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
  1. Worker Agents — Perform the actual tasks
  2. Supervisor Agent — Evaluates worker output against quality criteria
  3. Supervision Loop — Orchestrates retry cycles with feedback

Implementation

from dataclasses import dataclass
from enum import Enum
from typing import Callable, Any
import openai

class Verdict(Enum):
    APPROVED = "approved"
    NEEDS_REVISION = "needs_revision"
    ESCALATE = "escalate"

@dataclass
class Review:
    verdict: Verdict
    feedback: str
    score: float  # 0.0 to 1.0

@dataclass
class SupervisionResult:
    final_output: Any
    attempts: int
    approved: bool
    reviews: list[Review]

class Supervisor:
    def __init__(
        self,
        quality_criteria: str,
        max_retries: int = 3,
        min_score: float = 0.7,
    ):
        self.quality_criteria = quality_criteria
        self.max_retries = max_retries
        self.min_score = min_score
        self.client = openai.OpenAI()

    def evaluate(self, task: str, output: str) -> Review:
        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": (
                    "You are a quality reviewer. Evaluate the output "
                    "against these criteria:\n"
                    f"{self.quality_criteria}\n\n"
                    "Return JSON: {"verdict": "approved|needs_revision"
                    "|escalate", "feedback": "...", "score": 0.0-1.0}"
                )},
                {"role": "user", "content": (
                    f"Task: {task}\n\nOutput to review:\n{output}"
                )},
            ],
            response_format={"type": "json_object"},
        )
        import json
        data = json.loads(response.choices[0].message.content)
        return Review(
            verdict=Verdict(data["verdict"]),
            feedback=data["feedback"],
            score=data["score"],
        )

    def supervise(
        self,
        task: str,
        worker: Callable[[str, str | None], str],
    ) -> SupervisionResult:
        reviews: list[Review] = []
        feedback = None

        for attempt in range(1, self.max_retries + 1):
            # Worker produces output
            output = worker(task, feedback)

            # Supervisor evaluates
            review = self.evaluate(task, output)
            reviews.append(review)

            if review.verdict == Verdict.APPROVED:
                return SupervisionResult(
                    final_output=output,
                    attempts=attempt,
                    approved=True,
                    reviews=reviews,
                )

            if review.verdict == Verdict.ESCALATE:
                return SupervisionResult(
                    final_output=output,
                    attempts=attempt,
                    approved=False,
                    reviews=reviews,
                )

            # Provide feedback for next attempt
            feedback = review.feedback
            print(f"Attempt {attempt} rejected "
                  f"(score: {review.score:.2f}). Retrying...")

        # Exhausted retries
        return SupervisionResult(
            final_output=output,
            attempts=self.max_retries,
            approved=False,
            reviews=reviews,
        )

Using the Supervisor

client = openai.OpenAI()

def writing_agent(task: str, feedback: str | None) -> str:
    messages = [
        {"role": "system",
         "content": "You are a technical writer. Write clear, "
                    "accurate content."},
        {"role": "user", "content": task},
    ]
    if feedback:
        messages.append(
            {"role": "user",
             "content": f"Previous attempt was rejected. "
                        f"Feedback: {feedback}. Please revise."}
        )
    response = client.chat.completions.create(
        model="gpt-4o", messages=messages
    )
    return response.choices[0].message.content

supervisor = Supervisor(
    quality_criteria=(
        "1. Accuracy: No factual errors\n"
        "2. Completeness: Covers all aspects of the topic\n"
        "3. Clarity: Easy to understand for a developer audience\n"
        "4. Code examples: Includes working code if relevant"
    ),
    max_retries=3,
    min_score=0.8,
)

result = supervisor.supervise(
    task="Explain Python decorators with examples",
    worker=writing_agent,
)
print(f"Approved: {result.approved}, Attempts: {result.attempts}")

Escalation Strategy

When the supervisor sets the verdict to ESCALATE, it signals that the task requires human intervention — the error is beyond what automated retries can fix. Common escalation triggers include detecting contradictory requirements, safety-sensitive content, or outputs that score below a critical threshold even after multiple retries.

FAQ

Does the supervisor add significant latency and cost?

Yes, each supervision cycle adds one extra LLM call. For a 3-retry loop, worst case is 6 LLM calls (3 worker + 3 supervisor). Mitigate cost by using a cheaper model for supervision (GPT-4o-mini) and reserving the expensive model for the worker. In practice, most outputs pass on the first or second attempt.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Can the supervisor itself make mistakes?

Absolutely. The supervisor is also an LLM and can misjudge quality. Guard against this by keeping quality criteria extremely specific and measurable. Vague criteria like "be good" lead to inconsistent reviews. Concrete criteria like "must include at least one code example" and "must not exceed 500 words" produce reliable evaluations.

How do I prevent infinite loops between worker and supervisor?

The max_retries parameter is the hard stop. Additionally, track whether the score is improving across attempts. If the score stagnates or decreases after two retries, escalate immediately rather than burning through remaining attempts.


#AgentDesignPatterns #SupervisorPattern #Python #MultiAgentSystems #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

HVAC

Building an HVAC After-Hours Emergency Escalation System: A Complete Engineering Guide

How we built a fault-tolerant HVAC emergency triage and tech-dispatch platform on Kubernetes — three-tier CQRS, 11 micro-agents on the OpenAI Agents SDK + LangGraph, NATS JetStream, DTMF/SMS/WebSocket acceptance, circuit breakers, and an evaluation pipeline that catches regressions before they wake a tech at 3 AM.

Comparisons

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

The 2026 desktop AI agent landscape — ServiceNow Project Arc, Anthropic Claude offerings, OpenAI agents, and Google Mariner. A buyer's map.

AI Engineering

A2A Multi-Agent Architecture Patterns (2026 Reference)

Five proven multi-agent architecture patterns built on A2A — orchestrator, peer mesh, hub-and-spoke, marketplace, and tiered specialist.

Agentic AI

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Step-by-step build of a working agent with the OpenAI Agents SDK — Agent class, tools, handoffs, tracing — plus an eval pipeline that catches regressions before merge.

Agentic AI

Multi-Agent Handoffs with the OpenAI Agents SDK: The Pattern That Actually Scales (2026)

Handoffs done right — when one agent should hand control to another, how to preserve context, and how to evaluate the handoff decision itself.

Agentic AI

LangGraph Supervisor Pattern: Orchestrating Multi-Agent Teams in 2026 — Langgraph multi-agent supervisor handoffs docs

Langgraph multi-agent supervisor handoffs docs: the supervisor pattern in LangGraph for coordinating specialist agents, with full code, an eval pipeline that scores routing accuracy, and the failure modes to watch for.