Skip to content
Learn Agentic AI
Learn Agentic AI14 min read11 views

Building a Bug Fixing Agent: Automated Diagnosis and Repair of Code Issues

Build an AI agent that takes error messages, traces through code to find root causes, generates fixes, and verifies them with regression tests. A practical guide to automated debugging.

From Error Message to Working Fix

When a bug is reported, a developer follows a predictable workflow: read the error, find the relevant code, understand what went wrong, write a fix, and verify it does not break anything else. A bug fixing agent automates this entire loop. It takes an error message or test failure as input, traces the root cause through your codebase, generates a patch, and runs your test suite to confirm the fix.

The Bug Fixing Pipeline

The agent operates in four phases: error analysis, code localization, fix generation, and regression testing.

flowchart TD
    START["Building a Bug Fixing Agent: Automated Diagnosis …"] --> A
    A["From Error Message to Working Fix"]
    A --> B
    B["The Bug Fixing Pipeline"]
    B --> C
    C["Error Analysis: Understanding What Went…"]
    C --> D
    D["Code Localization: Finding the Bug"]
    D --> E
    E["Fix Generation"]
    E --> F
    F["Regression Testing"]
    F --> G
    G["FAQ"]
    G --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
import os
import subprocess
from dataclasses import dataclass
from openai import OpenAI

client = OpenAI()

@dataclass
class BugReport:
    error_message: str
    stack_trace: str | None = None
    failing_test: str | None = None
    reproduction_steps: str | None = None

@dataclass
class BugFix:
    file_path: str
    original_code: str
    fixed_code: str
    explanation: str
    root_cause: str
    confidence: float

class BugFixingAgent:
    def __init__(self, project_dir: str, model: str = "gpt-4o"):
        self.project_dir = project_dir
        self.model = model

    def diagnose_and_fix(self, report: BugReport) -> BugFix | None:
        root_cause = self._analyze_error(report)
        relevant_files = self._locate_code(root_cause, report)
        fix = self._generate_fix(root_cause, relevant_files, report)

        if fix and self._verify_fix(fix, report):
            return fix
        return None

Error Analysis: Understanding What Went Wrong

The first step is parsing the error to understand the category of bug and where it likely originates.

def _analyze_error(self, report: BugReport) -> dict:
    context = f"Error: {report.error_message}"
    if report.stack_trace:
        context += f"\n\nStack trace:\n{report.stack_trace}"
    if report.reproduction_steps:
        context += f"\n\nReproduction steps:\n{report.reproduction_steps}"

    response = client.chat.completions.create(
        model=self.model,
        messages=[
            {"role": "system", "content": """Analyze this error report and
identify the root cause. Return JSON with:
- "error_type": category (e.g., TypeError, logic_error, race_condition)
- "root_cause": one-sentence explanation of why this happens
- "likely_files": list of file patterns to search
- "likely_functions": list of function names involved
- "search_terms": list of strings to grep for in the codebase"""},
            {"role": "user", "content": context},
        ],
        temperature=0,
        response_format={"type": "json_object"},
    )

    import json
    return json.loads(response.choices[0].message.content)

The structured output tells the next phase exactly where to look in the codebase.

Code Localization: Finding the Bug

With search terms from the analysis, the agent locates the relevant source files.

def _locate_code(self, analysis: dict, report: BugReport) -> dict:
    relevant_code = {}

    if report.stack_trace:
        for line in report.stack_trace.split("\n"):
            if "File " in line and self.project_dir in line:
                parts = line.strip().split('"')
                if len(parts) >= 2:
                    file_path = parts[1]
                    if os.path.exists(file_path):
                        with open(file_path) as f:
                            relevant_code[file_path] = f.read()

    for term in analysis.get("search_terms", []):
        result = subprocess.run(
            ["grep", "-rl", term, self.project_dir,
             "--include=*.py", "--exclude-dir=__pycache__"],
            capture_output=True, text=True,
        )
        for file_path in result.stdout.strip().split("\n"):
            if file_path and file_path not in relevant_code:
                with open(file_path) as f:
                    relevant_code[file_path] = f.read()

    return relevant_code

The agent combines stack trace file references with grep-based search to build a complete picture of the relevant code.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Fix Generation

With the root cause understood and the relevant code loaded, the agent generates a targeted fix.

def _generate_fix(
    self, analysis: dict, code_files: dict, report: BugReport
) -> BugFix | None:
    files_context = ""
    for path, content in code_files.items():
        files_context += f"\n--- {path} ---\n{content}\n"

    response = client.chat.completions.create(
        model=self.model,
        messages=[
            {"role": "system", "content": """You are a senior developer
fixing a bug. Based on the error analysis and source code, generate
a minimal fix. Return JSON with:
- "file_path": which file to modify
- "original_code": exact code to replace (copy precisely)
- "fixed_code": the corrected code
- "explanation": what the fix does
- "root_cause": why the original code was wrong
- "confidence": 0.0 to 1.0 how confident you are

IMPORTANT: original_code must be an EXACT match of existing code.
Make the smallest change possible. Do not refactor unrelated code."""},
            {"role": "user", "content": (
                f"Error analysis: {analysis}\n\n"
                f"Error: {report.error_message}\n\n"
                f"Source files:\n{files_context}"
            )},
        ],
        temperature=0,
        response_format={"type": "json_object"},
    )

    import json
    data = json.loads(response.choices[0].message.content)
    return BugFix(**data)

Regression Testing

The fix is applied temporarily and the test suite runs to ensure nothing else breaks.

def _verify_fix(self, fix: BugFix, report: BugReport) -> bool:
    with open(fix.file_path) as f:
        original_content = f.read()

    if fix.original_code not in original_content:
        return False

    patched = original_content.replace(
        fix.original_code, fix.fixed_code, 1
    )

    try:
        with open(fix.file_path, "w") as f:
            f.write(patched)

        cmd = ["python", "-m", "pytest", "--tb=short", "-q"]
        if report.failing_test:
            cmd.append(report.failing_test)

        result = subprocess.run(
            cmd, capture_output=True, text=True,
            timeout=120, cwd=self.project_dir,
        )
        return result.returncode == 0
    finally:
        with open(fix.file_path, "w") as f:
            f.write(original_content)

The original file is always restored in the finally block, so even if the fix fails, your codebase is unchanged. The fix is only committed if all tests pass.

FAQ

How do I prevent the agent from making changes that break other parts of the code?

The regression testing step catches this. Run the full test suite, not just the failing test. If any test that was passing before now fails, reject the fix. For extra safety, use git to create a temporary branch, apply the fix, and run tests in isolation.

What if the bug is in a dependency rather than my own code?

The error analysis step should detect this. If all search terms point to code inside site-packages or a third-party library, the agent reports that the root cause is external and suggests a workaround or version pin instead of trying to modify library code.

Can this agent handle intermittent or flaky bugs?

Intermittent bugs like race conditions are harder because they may not reproduce on a single test run. For these, extend the agent to run the failing test multiple times and to analyze thread or async patterns in the code. The analysis prompt can specifically look for shared mutable state, missing locks, or unguarded async operations.


#BugFixing #AIAgents #Python #Debugging #AutomatedRepair #AgenticAI #LearnAI #AIEngineering

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Use Cases

Automating Client Document Collection: How AI Agents Chase Missing Tax Documents and Reduce Filing Delays

See how AI agents automate tax document collection — chasing missing W-2s, 1099s, and receipts via calls and texts to eliminate the #1 CPA bottleneck.

AI Interview Prep

7 AI Coding Interview Questions From Anthropic, Meta & OpenAI (2026 Edition)

Real AI coding interview questions from Anthropic, Meta, and OpenAI in 2026. Includes implementing attention from scratch, Anthropic's progressive coding screens, Meta's AI-assisted round, and vector search — with solution approaches.

Learn Agentic AI

API Design for AI Agent Tool Functions: Best Practices and Anti-Patterns

How to design tool functions that LLMs can use effectively with clear naming, enum parameters, structured responses, informative error messages, and documentation.

Learn Agentic AI

AI Agents for IT Helpdesk: L1 Automation, Ticket Routing, and Knowledge Base Integration

Build IT helpdesk AI agents with multi-agent architecture for triage, device, network, and security issues. RAG-powered knowledge base, automated ticket creation, routing, and escalation.

Learn Agentic AI

Computer Use in GPT-5.4: Building AI Agents That Navigate Desktop Applications

Technical guide to GPT-5.4's computer use capabilities for building AI agents that interact with desktop UIs, browser automation, and real-world application workflows.

Learn Agentic AI

Prompt Engineering for AI Agents: System Prompts, Tool Descriptions, and Few-Shot Patterns

Agent-specific prompt engineering techniques: crafting effective system prompts, writing clear tool descriptions for function calling, and few-shot examples that improve complex task performance.