---
title: "Building a Diagram Understanding Agent: Flowcharts, Architecture Diagrams, and Charts"
description: "Create an AI agent that classifies diagram types, extracts elements and relationships from flowcharts and architecture diagrams, and converts visual diagrams into structured data and code representations."
canonical: https://callsphere.ai/blog/building-diagram-understanding-agent-flowcharts-architecture-charts
category: "Learn Agentic AI"
tags: ["Diagram Analysis", "Flowcharts", "Architecture Diagrams", "Visual Understanding", "Python"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-07T23:20:13.916Z
---

# Building a Diagram Understanding Agent: Flowcharts, Architecture Diagrams, and Charts

> Create an AI agent that classifies diagram types, extracts elements and relationships from flowcharts and architecture diagrams, and converts visual diagrams into structured data and code representations.

## Why Diagram Understanding Is Valuable

Technical documentation is full of diagrams — flowcharts describing business processes, architecture diagrams showing system components, sequence diagrams illustrating API interactions, and data flow charts mapping pipelines. An agent that can read and understand these diagrams can answer questions about system architecture, generate code from flowcharts, identify missing components, and convert visual documentation into machine-readable formats.

## Diagram Classification

The first step is identifying what type of diagram the agent is looking at, because each type requires a different extraction strategy:

```mermaid
flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus
classify"]
    PLAN["Plan and tool
selection"]
    AGENT["Agent loop
LLM plus tools"]
    GUARD{"Guardrails
and policy"}
    EXEC["Execute and
verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus
next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
```

```python
import openai
import base64
from pydantic import BaseModel
from enum import Enum

class DiagramType(str, Enum):
    FLOWCHART = "flowchart"
    ARCHITECTURE = "architecture"
    SEQUENCE = "sequence"
    ER_DIAGRAM = "er_diagram"
    DATA_FLOW = "data_flow"
    ORG_CHART = "org_chart"
    CHART = "chart"  # bar, line, pie
    UNKNOWN = "unknown"

class DiagramClassification(BaseModel):
    diagram_type: DiagramType
    confidence: float
    description: str

async def classify_diagram(
    image_bytes: bytes, client: openai.AsyncOpenAI
) -> DiagramClassification:
    """Classify the type of diagram in an image."""
    b64 = base64.b64encode(image_bytes).decode()

    response = await client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "Classify this diagram. Identify the type, "
                    "your confidence level (0-1), and a brief "
                    "description of what the diagram shows."
                ),
            },
            {
                "role": "user",
                "content": [{
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{b64}"
                    },
                }],
            },
        ],
        response_format=DiagramClassification,
    )
    return response.choices[0].message.parsed
```

## Extracting Elements and Relationships

Once classified, extract the structural components. For flowcharts, this means nodes and edges. For architecture diagrams, it means components and connections:

```python
class DiagramNode(BaseModel):
    id: str
    label: str
    node_type: str  # process, decision, start, end, component
    properties: dict = {}

class DiagramEdge(BaseModel):
    source_id: str
    target_id: str
    label: str = ""
    edge_type: str = "directed"  # directed, bidirectional

class DiagramStructure(BaseModel):
    nodes: list[DiagramNode]
    edges: list[DiagramEdge]
    title: str = ""
    notes: list[str] = []

async def extract_structure(
    image_bytes: bytes,
    diagram_type: DiagramType,
    client: openai.AsyncOpenAI,
) -> DiagramStructure:
    """Extract nodes and edges from a diagram."""
    b64 = base64.b64encode(image_bytes).decode()

    type_hints = {
        DiagramType.FLOWCHART: (
            "This is a flowchart. Extract all process steps, "
            "decision points, start/end nodes, and the arrows "
            "connecting them. Use node types: process, decision, "
            "start, end, subprocess."
        ),
        DiagramType.ARCHITECTURE: (
            "This is an architecture diagram. Extract all system "
            "components (services, databases, queues, load "
            "balancers, etc.) and their connections. Use node "
            "types: service, database, queue, cache, gateway, "
            "client, external."
        ),
        DiagramType.SEQUENCE: (
            "This is a sequence diagram. Extract all participants "
            "as nodes and messages as edges in chronological order."
        ),
    }

    hint = type_hints.get(
        diagram_type,
        "Extract all elements and their relationships.",
    )

    response = await client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": hint},
            {
                "role": "user",
                "content": [{
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{b64}"
                    },
                }],
            },
        ],
        response_format=DiagramStructure,
    )
    return response.choices[0].message.parsed
```

## Converting Diagrams to Code

One of the most powerful capabilities is converting a visual diagram into executable code or infrastructure-as-code:

```python
async def diagram_to_mermaid(
    structure: DiagramStructure,
    diagram_type: DiagramType,
) -> str:
    """Convert extracted diagram structure to Mermaid syntax."""
    if diagram_type == DiagramType.FLOWCHART:
        lines = ["flowchart TD"]
        for node in structure.nodes:
            shape = {
                "decision": f"{node.id}{{{node.label}}}",
                "start": f"{node.id}([{node.label}])",
                "end": f"{node.id}([{node.label}])",
                "process": f"{node.id}[{node.label}]",
            }.get(node.node_type, f"{node.id}[{node.label}]")
            lines.append(f"    {shape}")

        for edge in structure.edges:
            if edge.label:
                lines.append(
                    f"    {edge.source_id} -->|{edge.label}| "
                    f"{edge.target_id}"
                )
            else:
                lines.append(
                    f"    {edge.source_id} --> {edge.target_id}"
                )

        return "\n".join(lines)

    elif diagram_type == DiagramType.ARCHITECTURE:
        lines = ["flowchart LR"]
        for node in structure.nodes:
            icon = {
                "database": f"{node.id}[({node.label})]",
                "queue": f"{node.id}>{node.label}]",
                "service": f"{node.id}[{node.label}]",
            }.get(node.node_type, f"{node.id}[{node.label}]")
            lines.append(f"    {icon}")

        for edge in structure.edges:
            arrow = (
                "  " if edge.edge_type == "bidirectional"
                else " --> "
            )
            lines.append(
                f"    {edge.source_id}{arrow}{edge.target_id}"
            )

        return "\n".join(lines)

    return "# Unsupported diagram type for Mermaid conversion"
```

## The Diagram Agent

```python
class DiagramUnderstandingAgent:
    def __init__(self):
        self.client = openai.AsyncOpenAI()

    async def analyze(self, image_bytes: bytes) -> dict:
        classification = await classify_diagram(
            image_bytes, self.client
        )
        structure = await extract_structure(
            image_bytes, classification.diagram_type, self.client
        )
        mermaid = await diagram_to_mermaid(
            structure, classification.diagram_type
        )

        return {
            "type": classification.diagram_type.value,
            "description": classification.description,
            "nodes": len(structure.nodes),
            "edges": len(structure.edges),
            "structure": structure.model_dump(),
            "mermaid_code": mermaid,
        }

    async def ask(
        self, image_bytes: bytes, question: str
    ) -> str:
        b64 = base64.b64encode(image_bytes).decode()
        response = await self.client.chat.completions.create(
            model="gpt-4o",
            messages=[{
                "role": "user",
                "content": [
                    {"type": "text", "text": question},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/png;base64,{b64}"
                        },
                    },
                ],
            }],
        )
        return response.choices[0].message.content
```

## FAQ

### How accurate is GPT-4o at extracting diagram structures compared to dedicated diagram parsers?

For clean, well-formatted diagrams, GPT-4o extracts nodes and edges with approximately 90% accuracy. It excels at understanding context and labels but can miss precise spatial relationships in dense diagrams. Dedicated parsers like those in draw.io or Lucidchart have access to the underlying XML and achieve near-perfect accuracy on their own formats. Use vision models when you only have a screenshot or image of the diagram.

### Can this agent handle hand-drawn diagrams on whiteboards?

Yes, with reduced accuracy. GPT-4o can interpret hand-drawn flowcharts and architecture sketches, identifying boxes, arrows, and labels even when the drawing is rough. For best results, ensure the whiteboard photo has good lighting, minimal glare, and the handwriting is reasonably legible. The classification step still works well because the overall layout patterns — boxes connected by arrows — are recognizable regardless of drawing quality.

### How do I validate that the extracted structure is correct?

Convert the extracted structure to Mermaid or Graphviz and render it visually. Compare the rendered output against the original diagram. You can also automate validation by checking that every node has at least one edge (no orphan nodes), decision nodes have exactly two outgoing edges, and start nodes have no incoming edges. These structural constraints catch most extraction errors.

---

#DiagramAnalysis #Flowcharts #ArchitectureDiagrams #VisualUnderstanding #Python #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/building-diagram-understanding-agent-flowcharts-architecture-charts
