The Supervisor Pattern: A Meta-Agent That Monitors and Corrects Other Agents

The Need for Supervision in Agent Systems

When you deploy multiple AI agents that perform important tasks — generating reports, answering customer questions, writing code — the outputs are not always correct on the first attempt. Models hallucinate, misinterpret instructions, or produce incomplete results. The Supervisor pattern introduces a meta-agent whose sole job is to monitor worker agent outputs, evaluate their quality, and either approve, request corrections, or escalate to a human.

This is analogous to a team lead reviewing work before it ships. The supervisor does not do the work itself — it judges whether the work meets quality standards.

Architecture Overview

The Supervisor pattern has three components:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart TD
    INPUT(["Task input"])
    SUPER["Supervisor agent<br/>plans plus monitors"]
    W1["Worker 1<br/>research"]
    W2["Worker 2<br/>code"]
    W3["Worker 3<br/>writing"]
    CRITIC{"Output meets<br/>rubric?"}
    REWORK["Rework or<br/>retry path"]
    SHARED[("Shared scratchpad<br/>and memory")]
    OUT(["Final result"])
    INPUT --> SUPER
    SUPER --> W1 --> CRITIC
    SUPER --> W2 --> CRITIC
    SUPER --> W3 --> CRITIC
    W1 --> SHARED
    W2 --> SHARED
    W3 --> SHARED
    SHARED --> SUPER
    CRITIC -->|Pass| OUT
    CRITIC -->|Fail| REWORK --> SUPER
    style SUPER fill:#4f46e5,stroke:#4338ca,color:#fff
    style CRITIC fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OUT fill:#059669,stroke:#047857,color:#fff
    style SHARED fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b

Worker Agents — Perform the actual tasks
Supervisor Agent — Evaluates worker output against quality criteria
Supervision Loop — Orchestrates retry cycles with feedback

Implementation

from dataclasses import dataclass
from enum import Enum
from typing import Callable, Any
import openai

class Verdict(Enum):
    APPROVED = "approved"
    NEEDS_REVISION = "needs_revision"
    ESCALATE = "escalate"

@dataclass
class Review:
    verdict: Verdict
    feedback: str
    score: float  # 0.0 to 1.0

@dataclass
class SupervisionResult:
    final_output: Any
    attempts: int
    approved: bool
    reviews: list[Review]

class Supervisor:
    def __init__(
        self,
        quality_criteria: str,
        max_retries: int = 3,
        min_score: float = 0.7,
    ):
        self.quality_criteria = quality_criteria
        self.max_retries = max_retries
        self.min_score = min_score
        self.client = openai.OpenAI()

    def evaluate(self, task: str, output: str) -> Review:
        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": (
                    "You are a quality reviewer. Evaluate the output "
                    "against these criteria:\n"
                    f"{self.quality_criteria}\n\n"
                    "Return JSON: {"verdict": "approved|needs_revision"
                    "|escalate", "feedback": "...", "score": 0.0-1.0}"
                )},
                {"role": "user", "content": (
                    f"Task: {task}\n\nOutput to review:\n{output}"
                )},
            ],
            response_format={"type": "json_object"},
        )
        import json
        data = json.loads(response.choices[0].message.content)
        return Review(
            verdict=Verdict(data["verdict"]),
            feedback=data["feedback"],
            score=data["score"],
        )

    def supervise(
        self,
        task: str,
        worker: Callable[[str, str | None], str],
    ) -> SupervisionResult:
        reviews: list[Review] = []
        feedback = None

        for attempt in range(1, self.max_retries + 1):
            # Worker produces output
            output = worker(task, feedback)

            # Supervisor evaluates
            review = self.evaluate(task, output)
            reviews.append(review)

            if review.verdict == Verdict.APPROVED:
                return SupervisionResult(
                    final_output=output,
                    attempts=attempt,
                    approved=True,
                    reviews=reviews,
                )

            if review.verdict == Verdict.ESCALATE:
                return SupervisionResult(
                    final_output=output,
                    attempts=attempt,
                    approved=False,
                    reviews=reviews,
                )

            # Provide feedback for next attempt
            feedback = review.feedback
            print(f"Attempt {attempt} rejected "
                  f"(score: {review.score:.2f}). Retrying...")

        # Exhausted retries
        return SupervisionResult(
            final_output=output,
            attempts=self.max_retries,
            approved=False,
            reviews=reviews,
        )

Using the Supervisor

client = openai.OpenAI()

def writing_agent(task: str, feedback: str | None) -> str:
    messages = [
        {"role": "system",
         "content": "You are a technical writer. Write clear, "
                    "accurate content."},
        {"role": "user", "content": task},
    ]
    if feedback:
        messages.append(
            {"role": "user",
             "content": f"Previous attempt was rejected. "
                        f"Feedback: {feedback}. Please revise."}
        )
    response = client.chat.completions.create(
        model="gpt-4o", messages=messages
    )
    return response.choices[0].message.content

supervisor = Supervisor(
    quality_criteria=(
        "1. Accuracy: No factual errors\n"
        "2. Completeness: Covers all aspects of the topic\n"
        "3. Clarity: Easy to understand for a developer audience\n"
        "4. Code examples: Includes working code if relevant"
    ),
    max_retries=3,
    min_score=0.8,
)

result = supervisor.supervise(
    task="Explain Python decorators with examples",
    worker=writing_agent,
)
print(f"Approved: {result.approved}, Attempts: {result.attempts}")

Escalation Strategy

When the supervisor sets the verdict to ESCALATE, it signals that the task requires human intervention — the error is beyond what automated retries can fix. Common escalation triggers include detecting contradictory requirements, safety-sensitive content, or outputs that score below a critical threshold even after multiple retries.

FAQ

Does the supervisor add significant latency and cost?

Yes, each supervision cycle adds one extra LLM call. For a 3-retry loop, worst case is 6 LLM calls (3 worker + 3 supervisor). Mitigate cost by using a cheaper model for supervision (GPT-4o-mini) and reserving the expensive model for the worker. In practice, most outputs pass on the first or second attempt.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Can the supervisor itself make mistakes?

Absolutely. The supervisor is also an LLM and can misjudge quality. Guard against this by keeping quality criteria extremely specific and measurable. Vague criteria like "be good" lead to inconsistent reviews. Concrete criteria like "must include at least one code example" and "must not exceed 500 words" produce reliable evaluations.

How do I prevent infinite loops between worker and supervisor?

The max_retries parameter is the hard stop. Additionally, track whether the score is improving across attempts. If the score stagnates or decreases after two retries, escalate immediately rather than burning through remaining attempts.

#AgentDesignPatterns #SupervisorPattern #Python #MultiAgentSystems #AgenticAI #LearnAI #AIEngineering

The Supervisor Pattern: A Meta-Agent That Monitors and Corrects Other Agents

The Need for Supervision in Agent Systems

Architecture Overview

Implementation

Using the Supervisor

Escalation Strategy

FAQ

Does the supervisor add significant latency and cost?

Can the supervisor itself make mistakes?

How do I prevent infinite loops between worker and supervisor?

Try CallSphere AI Voice Agents

Related Articles You May Like

Building an HVAC After-Hours Emergency Escalation System: A Complete Engineering Guide

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

A2A Multi-Agent Architecture Patterns (2026 Reference)

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Multi-Agent Handoffs with the OpenAI Agents SDK: The Pattern That Actually Scales (2026)

LangGraph Supervisor Pattern: Orchestrating Multi-Agent Teams in 2026 — Langgraph multi-agent supervisor handoffs docs