Skip to content
Learn Agentic AI
Learn Agentic AI11 min read3 views

Text-to-SQL Error Correction: Self-Healing Queries That Fix Their Own Mistakes

Build a text-to-SQL system with automatic error detection, self-correction loops, and intelligent retry strategies that fix query mistakes without user intervention.

Why Error Correction Matters

Even the best text-to-SQL systems produce invalid queries 10-20% of the time. Column name mismatches, syntax errors, incorrect data type comparisons, and ambiguous GROUP BY clauses are common failure modes. Without error correction, these failures become dead ends for the user.

A self-healing query system catches execution errors, feeds them back to the LLM with the original context, and asks for a corrected version. This simple loop recovers from 60-80% of first-attempt failures, pushing overall system accuracy from 85% to 95%+.

Error Categories in AI-Generated SQL

Understanding error types helps you build targeted correction strategies:

flowchart TD
    START["Text-to-SQL Error Correction: Self-Healing Querie…"] --> A
    A["Why Error Correction Matters"]
    A --> B
    B["Error Categories in AI-Generated SQL"]
    B --> C
    C["Building the Self-Correction Loop"]
    C --> D
    D["Detecting Logic Errors with Result Vali…"]
    D --> E
    E["Integrating Result Validation into the …"]
    E --> F
    F["FAQ"]
    F --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff

Syntax errors — Missing parentheses, incorrect keyword order, invalid function names. These produce immediate database errors with clear messages.

Schema errors — Referencing non-existent tables or columns. The error message includes the problematic identifier, making correction straightforward.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Type errors — Comparing incompatible types, like WHERE date_column = 42. These may produce errors or silently return wrong results.

Logic errors — Valid SQL that returns incorrect results. The query runs successfully but answers the wrong question. These are the hardest to detect.

Building the Self-Correction Loop

import openai
import psycopg2
from dataclasses import dataclass
from typing import Optional

@dataclass
class QueryAttempt:
    sql: str
    error: Optional[str]
    results: Optional[list[dict]]

class SelfHealingSQLAgent:
    def __init__(self, conn_string: str, schema: str):
        self.conn_string = conn_string
        self.schema = schema
        self.client = openai.OpenAI()
        self.max_retries = 3

    def _generate(self, question: str,
                  previous_attempts: list[QueryAttempt]) -> str:
        """Generate SQL, incorporating error context from previous attempts."""
        system_msg = f"""You are a PostgreSQL expert. Generate a SQL query for
the user's question using this schema:

{self.schema}

Return ONLY the SQL query."""

        # Build error context from previous attempts
        if previous_attempts:
            error_context = "\n\nPrevious attempts that failed:\n"
            for i, attempt in enumerate(previous_attempts, 1):
                error_context += f"\nAttempt {i}:\n"
                error_context += f"SQL: {attempt.sql}\n"
                error_context += f"Error: {attempt.error}\n"
            error_context += "\nFix the issues and generate a corrected query."
            system_msg += error_context

        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": system_msg},
                {"role": "user", "content": question},
            ],
            temperature=0,
        )
        return response.choices[0].message.content.strip()

    def _execute(self, sql: str) -> QueryAttempt:
        """Execute SQL and return the attempt result."""
        try:
            conn = psycopg2.connect(self.conn_string)
            cur = conn.cursor()
            cur.execute(sql)
            columns = [desc[0] for desc in cur.description]
            rows = [dict(zip(columns, row)) for row in cur.fetchall()]
            conn.close()
            return QueryAttempt(sql=sql, error=None, results=rows)
        except Exception as e:
            return QueryAttempt(sql=sql, error=str(e), results=None)

    def ask(self, question: str) -> dict:
        """Ask a question with automatic error correction."""
        attempts = []

        for retry in range(self.max_retries):
            sql = self._generate(question, attempts)
            attempt = self._execute(sql)
            attempts.append(attempt)

            if attempt.error is None:
                return {
                    "sql": sql,
                    "results": attempt.results,
                    "total_attempts": retry + 1,
                    "corrections": [a.error for a in attempts[:-1]],
                }

        # All retries exhausted
        return {
            "error": "Failed after all retry attempts",
            "attempts": [
                {"sql": a.sql, "error": a.error} for a in attempts
            ],
        }

Detecting Logic Errors with Result Validation

Syntax and schema errors produce exceptions. Logic errors are harder because the query succeeds but returns wrong data. Use heuristic checks to catch common logic errors:

class ResultValidator:
    """Detect potential logic errors in query results."""

    def validate(self, question: str, sql: str,
                 results: list[dict]) -> tuple[bool, str]:
        checks = [
            self._check_empty_results,
            self._check_unreasonable_counts,
            self._check_null_heavy_results,
            self._check_aggregation_mismatch,
        ]
        for check in checks:
            is_valid, reason = check(question, sql, results)
            if not is_valid:
                return False, reason
        return True, "Results look reasonable"

    def _check_empty_results(self, question: str, sql: str,
                             results: list[dict]) -> tuple[bool, str]:
        if not results:
            # Empty results might be correct, but flag for review
            return False, ("Query returned 0 rows. This may indicate an "
                          "overly restrictive WHERE clause or a wrong table.")
        return True, "OK"

    def _check_unreasonable_counts(self, question: str, sql: str,
                                    results: list[dict]) -> tuple[bool, str]:
        upper_q = question.upper()
        if any(word in upper_q for word in ["HOW MANY", "COUNT", "TOTAL"]):
            # For count questions, check if result is a single row
            if len(results) > 1 and "count" not in str(results[0].keys()).lower():
                return False, ("Question asks for a count but query returned "
                              "multiple rows without aggregation.")
        return True, "OK"

    def _check_null_heavy_results(self, question: str, sql: str,
                                   results: list[dict]) -> tuple[bool, str]:
        if results:
            null_count = sum(
                1 for row in results
                for v in row.values() if v is None
            )
            total_values = len(results) * len(results[0])
            if total_values > 0 and null_count / total_values > 0.5:
                return False, ("Over 50% of result values are NULL, "
                              "suggesting a wrong JOIN or missing data.")
        return True, "OK"

    def _check_aggregation_mismatch(self, question: str, sql: str,
                                     results: list[dict]) -> tuple[bool, str]:
        upper_q = question.upper()
        has_group_word = any(w in upper_q for w in ["EACH", "PER", "BY", "EVERY"])
        if has_group_word and len(results) <= 1:
            return False, ("Question implies grouping ('each', 'per', 'by') "
                          "but only 1 row returned. Missing GROUP BY?")
        return True, "OK"

Integrating Result Validation into the Agent

class SelfHealingSQLAgentV2(SelfHealingSQLAgent):
    def __init__(self, conn_string: str, schema: str):
        super().__init__(conn_string, schema)
        self.result_validator = ResultValidator()

    def ask(self, question: str) -> dict:
        attempts = []

        for retry in range(self.max_retries):
            sql = self._generate(question, attempts)
            attempt = self._execute(sql)

            if attempt.error:
                attempts.append(attempt)
                continue

            # Validate results for logic errors
            is_valid, reason = self.result_validator.validate(
                question, sql, attempt.results
            )
            if not is_valid:
                attempt.error = f"Result validation: {reason}"
                attempts.append(attempt)
                continue

            attempts.append(attempt)
            return {
                "sql": sql,
                "results": attempt.results,
                "total_attempts": retry + 1,
            }

        return {"error": "All attempts failed", "attempts": len(attempts)}

FAQ

How many retries should I allow?

Three retries is the sweet spot. Most recoverable errors are fixed on the second attempt when the LLM sees the error message. A third attempt handles edge cases where the second fix introduces a new issue. Beyond three retries, the success rate drops sharply — if the model cannot fix it in three tries, it likely needs human intervention or a clearer question.

Does error correction increase latency significantly?

On average, 85% of queries succeed on the first attempt, so the typical user experiences no added latency. For the 15% that need correction, each retry adds 1-2 seconds (one LLM call plus one database query). The total worst-case for three retries is about 6 seconds, which is acceptable for analytical queries where users expect some processing time.

Can I use the error correction history to improve the model over time?

Yes. Log all correction events — the original question, the failed SQL, the error message, and the corrected SQL. This creates a training dataset of common mistakes. You can use it for few-shot examples in the prompt or for fine-tuning to reduce first-attempt errors over time.


#ErrorCorrection #SelfHealing #TextToSQL #QueryRetry #AgenticAI #LLMDebugging #SQLFixing #AIReliability

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Learn Agentic AI

Text-to-SQL with Claude: Using Anthropic's API for Database Question Answering

Implement a text-to-SQL system using Anthropic's Claude API with tool use for SQL execution, multi-turn conversations, and structured output parsing for reliable database question answering.

Learn Agentic AI

Text-to-SQL Fundamentals: Converting Natural Language Questions to Database Queries

Learn what text-to-SQL is, how the architecture works from schema understanding to query generation, and why it is one of the most practical applications of large language models in enterprise software.

Learn Agentic AI

Multi-Table Text-to-SQL: Handling JOINs, Subqueries, and Complex Relationships

Master multi-table text-to-SQL challenges including JOIN inference, ambiguous column resolution, query planning for complex questions, and techniques that help LLMs reason across table relationships.

Learn Agentic AI

Text-to-SQL with Open-Source Models: SQLCoder, NSQL, and DeFog SQLCoder

Compare open-source text-to-SQL models including SQLCoder, NSQL, and DeFog SQLCoder. Learn how to deploy them locally, fine-tune on your schema, and evaluate accuracy against commercial alternatives.

Learn Agentic AI

Schema Representation for Text-to-SQL: How to Describe Your Database to LLMs

Master the art of schema representation for text-to-SQL systems. Learn how to format CREATE TABLE statements, add column descriptions, encode foreign key relationships, and provide sample data for maximum query accuracy.

Learn Agentic AI

SQL Query Validation and Safety: Preventing Dangerous Queries from AI Agents

Learn how to build a robust SQL validation layer that prevents injection attacks, enforces read-only access, limits query complexity, and ensures AI-generated queries cannot damage your database.