Text-to-SQL Fundamentals: Converting Natural Language Questions to Database Queries

What Is Text-to-SQL?

Text-to-SQL is the task of converting a natural language question into a valid SQL query that can be executed against a relational database. Instead of requiring users to learn SQL syntax, they can ask questions like "How many orders were placed last month?" and receive both the generated query and its results.

This capability has been studied in NLP research for decades, but large language models have made it genuinely practical. Modern LLMs can parse complex questions, reason about database schemas, and produce syntactically correct SQL with high accuracy — often exceeding 85% on standard benchmarks.

The Text-to-SQL Pipeline

A complete text-to-SQL system involves four stages that run sequentially for each user question.

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff

Stage 1: Schema Understanding. The system loads the database schema — table names, column names, data types, primary keys, and foreign key relationships. This schema context is essential because the LLM needs to know what tables and columns exist to write valid queries.

Stage 2: Question Analysis. The natural language question is parsed to identify the intent (aggregation, filtering, sorting), the entities being referenced, and any implicit constraints. The phrase "top customers" implies ordering and a LIMIT clause.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Stage 3: Query Generation. The LLM receives the schema context and the analyzed question, then produces a SQL query. This is typically done through a carefully engineered prompt that includes the schema, instructions, and optionally a few examples.

Stage 4: Execution and Formatting. The generated SQL is validated, executed against the database, and the results are formatted into a human-readable response.

A Minimal Implementation

Here is the simplest possible text-to-SQL system using an LLM:

import openai
import sqlite3

def get_schema(db_path: str) -> str:
    """Extract CREATE TABLE statements from a SQLite database."""
    conn = sqlite3.connect(db_path)
    cursor = conn.execute(
        "SELECT sql FROM sqlite_master WHERE type='table' AND sql IS NOT NULL"
    )
    schema = "\n\n".join(row[0] for row in cursor.fetchall())
    conn.close()
    return schema

def text_to_sql(question: str, schema: str) -> str:
    """Convert a natural language question to SQL."""
    client = openai.OpenAI()
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": f"""You are a SQL expert. Given the following database schema,
convert the user's question into a valid SQL query.

Schema:
{schema}

Return ONLY the SQL query, no explanation.""",
            },
            {"role": "user", "content": question},
        ],
        temperature=0,
    )
    return response.choices[0].message.content.strip()

# Usage
schema = get_schema("sales.db")
sql = text_to_sql("What are the top 5 products by revenue?", schema)
print(sql)

This produces a query like SELECT product_name, SUM(price * quantity) AS revenue FROM orders GROUP BY product_name ORDER BY revenue DESC LIMIT 5.

Why Schema Context Matters

The most common failure mode in text-to-SQL is hallucinated column names. Without schema context, an LLM might generate SELECT customer_name FROM users when the actual column is full_name in a table called customers. Providing the full CREATE TABLE statements eliminates most of these errors.

Including foreign key relationships is equally important. When a user asks "Which customers have not placed any orders?", the LLM needs to know that orders.customer_id references customers.id to generate the correct LEFT JOIN with a NULL check.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Challenges in Real-World Systems

Ambiguity. The question "Show me sales for last quarter" requires knowing the current date and what "quarter" means in the business context. Does "sales" mean revenue, order count, or unit volume?

Complex queries. Questions involving multiple aggregations, window functions, or correlated subqueries push the boundaries of current LLM accuracy. A question like "What percentage of each department's budget has been spent this fiscal year?" requires CTEs or subqueries that LLMs sometimes get wrong.

Performance. The generated SQL might be correct but inefficient. An LLM might produce a correlated subquery where a JOIN would perform better. Production systems need query analysis to catch these cases.

FAQ

How accurate is text-to-SQL with current LLMs?

GPT-4 class models achieve 80-87% execution accuracy on the Spider benchmark, which tests across 200 different databases. On simpler single-table queries, accuracy often exceeds 95%. Production systems improve on these numbers by adding schema context, few-shot examples, and error correction loops.

Can text-to-SQL work with any database?

Yes. The approach is database-agnostic as long as you can extract the schema and execute queries. PostgreSQL, MySQL, SQLite, SQL Server, and BigQuery all work. You may need to adjust the system prompt to specify dialect-specific syntax like ILIKE in PostgreSQL versus LOWER() LIKE in MySQL.

Is text-to-SQL safe to run against production databases?

Not without safeguards. You must enforce read-only access, query complexity limits, and result size caps. Never allow DELETE, UPDATE, INSERT, or DDL statements from AI-generated queries. Use a read replica or a dedicated analytics database.

#TextToSQL #NaturalLanguageProcessing #LLM #DatabaseQuerying #AgenticAI #SQLGeneration #AIEngineering

Text-to-SQL Fundamentals: Converting Natural Language Questions to Database Queries

What Is Text-to-SQL?

The Text-to-SQL Pipeline

A Minimal Implementation

Why Schema Context Matters

Challenges in Real-World Systems

FAQ

How accurate is text-to-SQL with current LLMs?

Can text-to-SQL work with any database?

Is text-to-SQL safe to run against production databases?

Try CallSphere AI Voice Agents

Related Articles You May Like

Claude Sonnet 4.6 Vision Capabilities for Document and Chart Unders...

Claude for Equity Research: Workflows from Buy-Side Analysts

Constitutional AI: Genuine Safety Moat or Sophisticated Marketing?

Bedrock Agents Powered by Claude: A Reference Architecture

LLM A/B Testing in Production: Metrics and Pitfalls

Sparse Attention Patterns: Sliding Window, Longformer, BigBird Today