---
title: "Building a Whiteboard-to-Code Agent: Converting Hand-Drawn Diagrams to Working Software"
description: "Learn how to build an AI agent that recognizes hand-drawn diagrams on whiteboards, classifies shapes and connections, and generates working code including Mermaid diagrams, database schemas, and API stubs."
canonical: https://callsphere.ai/blog/building-whiteboard-to-code-agent-diagrams-working-software
category: "Learn Agentic AI"
tags: ["Whiteboard AI", "Diagram Recognition", "Code Generation", "Computer Vision", "Mermaid.js"]
author: "CallSphere Team"
published: 2026-03-18T00:00:00.000Z
updated: 2026-05-06T01:02:46.020Z
---

# Building a Whiteboard-to-Code Agent: Converting Hand-Drawn Diagrams to Working Software

> Learn how to build an AI agent that recognizes hand-drawn diagrams on whiteboards, classifies shapes and connections, and generates working code including Mermaid diagrams, database schemas, and API stubs.

## From Sketch to Code in Seconds

Whiteboards are where software architecture happens. Teams sketch entity-relationship diagrams, flowcharts, system architectures, and UI wireframes during design sessions. But these diagrams typically die on the whiteboard — someone takes a photo, it gets buried in a Slack thread, and the knowledge is effectively lost.

A whiteboard-to-code agent changes this. It takes a photo of a whiteboard, identifies the shapes, arrows, and text, understands the diagram type, and produces working code artifacts: Mermaid diagrams for documentation, SQL schemas for databases, API route stubs, or even class definitions.

## Architecture of the Agent

The pipeline has four stages:

```mermaid
flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus
classify"]
    PLAN["Plan and tool
selection"]
    AGENT["Agent loop
LLM plus tools"]
    GUARD{"Guardrails
and policy"}
    EXEC["Execute and
verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus
next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
```

1. **Image preprocessing** — clean up whiteboard photo artifacts
2. **Element detection** — find shapes (boxes, circles, diamonds) and connections (arrows, lines)
3. **Semantic classification** — determine diagram type and element meanings
4. **Code generation** — produce the appropriate code output

## Image Preprocessing for Whiteboards

Whiteboard photos have unique challenges: glare, perspective distortion, marker color variations, and erased-but-visible ghost text:

```python
import cv2
import numpy as np

def preprocess_whiteboard(image_path: str) -> np.ndarray:
    """Clean up a whiteboard photo for element detection."""
    img = cv2.imread(image_path)

    # Perspective correction: find the whiteboard boundary
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    blurred = cv2.GaussianBlur(gray, (5, 5), 0)
    edges = cv2.Canny(blurred, 50, 150)

    contours, _ = cv2.findContours(
        edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
    )

    if contours:
        largest = max(contours, key=cv2.contourArea)
        epsilon = 0.02 * cv2.arcLength(largest, True)
        approx = cv2.approxPolyDP(largest, epsilon, True)

        if len(approx) == 4:
            pts = approx.reshape(4, 2).astype(np.float32)
            width, height = 1200, 900
            dst = np.array([
                [0, 0], [width, 0],
                [width, height], [0, height]
            ], dtype=np.float32)
            matrix = cv2.getPerspectiveTransform(pts, dst)
            img = cv2.warpPerspective(img, matrix, (width, height))

    # Enhance contrast and remove background
    hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
    mask = cv2.inRange(hsv, (0, 30, 0), (180, 255, 255))
    result = cv2.bitwise_and(img, img, mask=mask)

    return result
```

## Shape Detection and Classification

Detect individual shapes by finding contours and classifying them based on geometry:

```python
from dataclasses import dataclass, field
from enum import Enum

class ShapeType(Enum):
    RECTANGLE = "rectangle"
    CIRCLE = "circle"
    DIAMOND = "diamond"
    ARROW = "arrow"
    TEXT = "text"
    UNKNOWN = "unknown"

@dataclass
class DiagramElement:
    shape: ShapeType
    bbox: tuple  # (x, y, w, h)
    center: tuple  # (cx, cy)
    label: str = ""
    connections: list[int] = field(default_factory=list)

def detect_shapes(image: np.ndarray) -> list[DiagramElement]:
    """Detect and classify shapes in the preprocessed image."""
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    _, binary = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV)

    contours, _ = cv2.findContours(
        binary, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
    )

    elements = []
    for contour in contours:
        area = cv2.contourArea(contour)
        if area  0.8:
            shape = ShapeType.CIRCLE
        elif len(approx) == 4:
            aspect = w / float(h)
            angle = cv2.minAreaRect(contour)[-1]
            if 0.8  30:
                shape = ShapeType.DIAMOND
            else:
                shape = ShapeType.RECTANGLE
        else:
            shape = ShapeType.UNKNOWN

        elements.append(DiagramElement(
            shape=shape,
            bbox=(x, y, w, h),
            center=center,
        ))

    return elements
```

## Text Recognition Within Shapes

Extract the text label inside each detected shape:

```python
import pytesseract
from PIL import Image

def extract_shape_labels(
    image: np.ndarray,
    elements: list[DiagramElement]
) -> list[DiagramElement]:
    """Read text inside each detected shape."""
    for elem in elements:
        x, y, w, h = elem.bbox
        padding = 5
        roi = image[
            max(0, y - padding):y + h + padding,
            max(0, x - padding):x + w + padding
        ]

        roi_pil = Image.fromarray(roi)
        text = pytesseract.image_to_string(
            roi_pil, config="--psm 6"
        ).strip()

        elem.label = text if text else f"Element_{elements.index(elem)}"

    return elements
```

## Connection Detection

Find arrows and lines that connect shapes:

```python
def detect_connections(
    elements: list[DiagramElement],
    image: np.ndarray
) -> list[tuple[int, int]]:
    """Detect which elements are connected by arrows or lines."""
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    edges = cv2.Canny(gray, 50, 150)

    lines = cv2.HoughLinesP(
        edges, 1, np.pi / 180,
        threshold=50, minLineLength=30, maxLineGap=10
    )

    connections = []
    if lines is None:
        return connections

    for line in lines:
        x1, y1, x2, y2 = line[0]

        start_elem = find_nearest_element(elements, (x1, y1))
        end_elem = find_nearest_element(elements, (x2, y2))

        if (start_elem is not None and end_elem is not None
                and start_elem != end_elem):
            connections.append((start_elem, end_elem))

    return list(set(connections))

def find_nearest_element(
    elements: list[DiagramElement],
    point: tuple,
    max_dist: float = 50.0
) -> int | None:
    """Find the element closest to a given point."""
    min_dist = float("inf")
    nearest = None

    for i, elem in enumerate(elements):
        dist = np.sqrt(
            (elem.center[0] - point[0]) ** 2 +
            (elem.center[1] - point[1]) ** 2
        )
        if dist  str:
    """Generate Mermaid diagram syntax from detected elements."""
    lines = [f"flowchart TD"]

    # Define nodes
    for i, elem in enumerate(elements):
        label = elem.label.replace('"', "'")
        if elem.shape == ShapeType.CIRCLE:
            lines.append(f'    N{i}(("{label}"))')
        elif elem.shape == ShapeType.DIAMOND:
            lines.append(f'    N{i}{{"{label}"}}')
        else:
            lines.append(f'    N{i}["{label}"]')

    # Define connections
    for start, end in connections:
        lines.append(f"    N{start} --> N{end}")

    return "\n".join(lines)
```

## Generating SQL Schema from ER Diagrams

When the diagram is identified as an entity-relationship diagram, generate a SQL schema:

```python
from openai import OpenAI

def diagram_to_sql(
    elements: list[DiagramElement],
    connections: list[tuple[int, int]]
) -> str:
    """Use an LLM to generate SQL from detected ER diagram."""
    diagram_desc = "Entities:\n"
    for i, elem in enumerate(elements):
        diagram_desc += f"- {elem.label} ({elem.shape.value})\n"

    diagram_desc += "\nRelationships:\n"
    for start, end in connections:
        diagram_desc += (
            f"- {elements[start].label} -> "
            f"{elements[end].label}\n"
        )

    client = OpenAI()
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                "Convert this ER diagram description into a PostgreSQL "
                "schema. Include primary keys, foreign keys, appropriate "
                "data types, and indexes. Only output SQL, no explanation."
            )},
            {"role": "user", "content": diagram_desc},
        ],
    )

    return response.choices[0].message.content
```

## FAQ

### How well does this work with messy handwriting?

The accuracy depends heavily on handwriting legibility. Block letters in dark markers on a clean whiteboard work well — expect 85-90% text recognition accuracy. Cursive or small writing drops significantly. For critical diagrams, consider having users write labels in a structured way or adding a manual correction step before code generation.

### Can the agent distinguish between different diagram types automatically?

Yes, with LLM-powered classification. Send the detected shapes, their types, and connection patterns to an LLM and ask it to classify the diagram as a flowchart, ER diagram, sequence diagram, or architecture diagram. The shape distribution is a strong signal: many diamonds suggest a flowchart, all rectangles with labeled connections suggest an ER diagram.

### How do I handle diagrams with multiple colors?

Color carries semantic meaning on whiteboards — red might mean errors, green might mean success paths. Preserve color information during preprocessing and pass it to the LLM as metadata. For example, annotate each element with its dominant color so the code generator can map red paths to error handlers and green paths to success flows.

---

#WhiteboardAI #DiagramRecognition #CodeGeneration #MermaidJS #ComputerVision #AgenticAI #Python #SoftwareDesign

---

Source: https://callsphere.ai/blog/building-whiteboard-to-code-agent-diagrams-working-software
