Skip to content
Building a Code Generation Pipeline with 5 Specialized Agents: Planner, Coder, Reviewer, Tester, Deployer
Learn Agentic AI15 min read25 views

Building a Code Generation Pipeline with 5 Specialized Agents: Planner, Coder, Reviewer, Tester, Deployer

Build an end-to-end code generation pipeline using five specialized AI agents — Planner, Coder, Reviewer, Tester, and Deployer — with complete handoff data structures and orchestration logic in Python.

Why Specialized Agents Beat a Single Code Generator

A single LLM prompted with "write me a function" can produce code, but it cannot reliably plan architecture, enforce coding standards, write meaningful tests, and handle deployment. Each of these tasks demands a different reasoning mode. By splitting the pipeline into five specialized agents — Planner, Coder, Reviewer, Tester, and Deployer — each agent can be optimized for its specific responsibility with tailored prompts, tools, and evaluation criteria.

This is the same principle that makes software engineering teams effective: specialization with clear handoff contracts.

The Pipeline Architecture

User Request
    |
    v
[Planner Agent] --> Implementation Plan
    |
    v
[Coder Agent] --> Generated Code
    |
    v
[Reviewer Agent] --> Review Feedback (pass/fail)
    |                      |
    | (if fail)            | (if pass)
    v                      v
[Coder Agent]         [Tester Agent] --> Test Results (pass/fail)
  (revision)               |
                           v (if pass)
                    [Deployer Agent] --> Deployment Artifact

Handoff Data Structures

The key to a reliable pipeline is well-defined data structures at each handoff point.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
flowchart TD
    INPUT(["Task input"])
    SUPER["Supervisor agent<br/>plans plus monitors"]
    W1["Worker 1<br/>research"]
    W2["Worker 2<br/>code"]
    W3["Worker 3<br/>writing"]
    CRITIC{"Output meets<br/>rubric?"}
    REWORK["Rework or<br/>retry path"]
    SHARED[("Shared scratchpad<br/>and memory")]
    OUT(["Final result"])
    INPUT --> SUPER
    SUPER --> W1 --> CRITIC
    SUPER --> W2 --> CRITIC
    SUPER --> W3 --> CRITIC
    W1 --> SHARED
    W2 --> SHARED
    W3 --> SHARED
    SHARED --> SUPER
    CRITIC -->|Pass| OUT
    CRITIC -->|Fail| REWORK --> SUPER
    style SUPER fill:#4f46e5,stroke:#4338ca,color:#fff
    style CRITIC fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OUT fill:#059669,stroke:#047857,color:#fff
    style SHARED fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
from dataclasses import dataclass, field
from typing import List, Optional, Dict
from enum import Enum

class StageStatus(Enum):
    PENDING = "pending"
    PASSED = "passed"
    FAILED = "failed"
    NEEDS_REVISION = "needs_revision"

@dataclass
class ImplementationPlan:
    summary: str
    files_to_create: List[str]
    files_to_modify: List[str]
    dependencies: List[str]
    architecture_notes: str
    acceptance_criteria: List[str]

@dataclass
class GeneratedCode:
    files: Dict[str, str]          # filename -> content
    dependencies_added: List[str]
    explanation: str
    revision_number: int = 1

@dataclass
class ReviewResult:
    status: StageStatus
    issues: List[Dict[str, str]]   # {"severity": ..., "description": ...}
    suggestions: List[str]
    score: float                   # 0.0 to 1.0

@dataclass
class TestResult:
    status: StageStatus
    test_files: Dict[str, str]     # filename -> test content
    tests_passed: int
    tests_failed: int
    coverage_percent: float
    failure_details: List[str]

@dataclass
class DeploymentArtifact:
    dockerfile: Optional[str]
    deployment_config: Optional[str]
    environment_variables: List[str]
    deploy_instructions: str

Implementing Each Agent

The Planner Agent

The Planner analyzes the user's request and produces a structured implementation plan.

from openai import AsyncOpenAI

client = AsyncOpenAI()

async def planner_agent(user_request: str) -> ImplementationPlan:
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a software architect. Given a feature request, "
                    "produce a structured implementation plan. Output JSON with "
                    "keys: summary, files_to_create, files_to_modify, "
                    "dependencies, architecture_notes, acceptance_criteria."
                ),
            },
            {"role": "user", "content": user_request},
        ],
        response_format={"type": "json_object"},
    )
    data = json.loads(response.choices[0].message.content)
    return ImplementationPlan(**data)

The Coder Agent

The Coder takes the plan and produces implementation code.

async def coder_agent(
    plan: ImplementationPlan,
    review_feedback: Optional[ReviewResult] = None,
    revision: int = 1,
) -> GeneratedCode:
    context = f"Implementation plan:\n{plan.summary}\n"
    context += f"Files to create: {plan.files_to_create}\n"
    context += f"Architecture: {plan.architecture_notes}\n"

    if review_feedback and review_feedback.status == StageStatus.NEEDS_REVISION:
        context += "\nReview feedback to address:\n"
        for issue in review_feedback.issues:
            context += f"- [{issue['severity']}] {issue['description']}\n"

    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are an expert software engineer. Generate production "
                    "code based on the plan. Return JSON with keys: files "
                    "(object mapping filename to content), dependencies_added "
                    "(list), explanation (string)."
                ),
            },
            {"role": "user", "content": context},
        ],
        response_format={"type": "json_object"},
    )
    data = json.loads(response.choices[0].message.content)
    return GeneratedCode(**data, revision_number=revision)

The Reviewer Agent

async def reviewer_agent(
    plan: ImplementationPlan, code: GeneratedCode
) -> ReviewResult:
    review_prompt = (
        f"Plan: {plan.summary}\n"
        f"Acceptance criteria: {plan.acceptance_criteria}\n"
        f"Code files: {json.dumps(code.files, indent=2)}"
    )
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a senior code reviewer. Evaluate the code against "
                    "the plan. Check for bugs, security issues, missing edge "
                    "cases, and adherence to acceptance criteria. Return JSON: "
                    "status (passed/needs_revision), issues (list of objects "
                    "with severity and description), suggestions, score (0-1)."
                ),
            },
            {"role": "user", "content": review_prompt},
        ],
        response_format={"type": "json_object"},
    )
    data = json.loads(response.choices[0].message.content)
    data["status"] = StageStatus(data["status"])
    return ReviewResult(**data)

The Pipeline Orchestrator

import json

async def run_pipeline(user_request: str, max_revisions: int = 3):
    # Step 1: Plan
    plan = await planner_agent(user_request)
    print(f"Plan: {plan.summary}")

    code = None
    review = None
    for revision in range(1, max_revisions + 1):
        # Step 2: Code (or revise)
        code = await coder_agent(plan, review, revision)
        print(f"Code generated (revision {revision})")

        # Step 3: Review
        review = await reviewer_agent(plan, code)
        print(f"Review score: {review.score}")

        if review.status == StageStatus.PASSED:
            break
    else:
        print("Max revisions reached — proceeding with best effort")

    # Step 4: Test
    test_result = await tester_agent(plan, code)
    if test_result.status == StageStatus.FAILED:
        print(f"Tests failed: {test_result.failure_details}")
        return None

    # Step 5: Deploy
    artifact = await deployer_agent(code)
    print(f"Deployment ready: {artifact.deploy_instructions}")
    return artifact

The review loop is the key quality mechanism — the Coder receives specific feedback from the Reviewer and addresses it in the next revision, mimicking a real code review cycle.

FAQ

Why not use a single agent with all five capabilities?

Specialization allows each agent to have focused system prompts, specific tool access, and tailored output schemas. The Reviewer agent, for example, should never have access to the deployment tools. Separation of concerns also lets you swap individual agents (e.g., upgrade only your Tester agent) without affecting the rest of the pipeline.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

How do I handle the Coder agent producing code that never passes review?

Set a maximum revision count (typically 2-3). If the code still fails review after max revisions, log the failure with the review feedback, alert a human, and halt the pipeline. In practice, most code passes review within 2 iterations when the Planner produces clear acceptance criteria.

Can I run some agents in parallel?

The Tester and Deployer agents are sequential — you cannot deploy untested code. However, if you have multiple independent code files, you can parallelize coding and reviewing across files. The Planner output should indicate which files are independent to enable this.


#CodeGeneration #AgentPipeline #MultiAgentAI #SoftwareEngineering #AIDevTools #AgenticAI #PythonAI #CICDAgents

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

HVAC

Building an HVAC After-Hours Emergency Escalation System: A Complete Engineering Guide

How we built a fault-tolerant HVAC emergency triage and tech-dispatch platform on Kubernetes — three-tier CQRS, 11 micro-agents on the OpenAI Agents SDK + LangGraph, NATS JetStream, DTMF/SMS/WebSocket acceptance, circuit breakers, and an evaluation pipeline that catches regressions before they wake a tech at 3 AM.

AI Engineering

A2A Multi-Agent Architecture Patterns (2026 Reference)

Five proven multi-agent architecture patterns built on A2A — orchestrator, peer mesh, hub-and-spoke, marketplace, and tiered specialist.

Agentic AI

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Step-by-step build of a working agent with the OpenAI Agents SDK — Agent class, tools, handoffs, tracing — plus an eval pipeline that catches regressions before merge.

Agentic AI

Multi-Agent Handoffs with the OpenAI Agents SDK: The Pattern That Actually Scales (2026)

Handoffs done right — when one agent should hand control to another, how to preserve context, and how to evaluate the handoff decision itself.

Agentic AI

LangGraph Supervisor Pattern: Orchestrating Multi-Agent Teams in 2026 — Langgraph multi-agent supervisor handoffs docs

Langgraph multi-agent supervisor handoffs docs: the supervisor pattern in LangGraph for coordinating specialist agents, with full code, an eval pipeline that scores routing accuracy, and the failure modes to watch for.

Agentic AI

Smolagents: Hugging Face's Code-First Agent Framework Reviewed

Smolagents lets agents write Python instead of JSON. Why code-as-action reduces tool errors and where the security trade-offs are for production deployments.