Skip to content
Building a File Organization Agent: AI-Powered Document Categorization and Filing
Learn Agentic AI13 min read21 views

Building a File Organization Agent: AI-Powered Document Categorization and Filing

Build an AI agent that scans directories, analyzes file content, categorizes documents by type and topic, and organizes them into a structured folder hierarchy with consistent naming conventions.

The Cost of Digital Disorganization

A typical shared drive accumulates thousands of files with names like "Final_v2_REVISED.docx" and "report copy (3).pdf." Finding the right document means searching through nested folders with inconsistent naming, duplicate files scattered across directories, and no clear taxonomy. An AI file organization agent solves this by analyzing file content, categorizing documents by type and topic, and filing them into a structured hierarchy.

This guide builds a complete file organization agent that scans directories, extracts content from multiple file types, uses an LLM for intelligent categorization, and reorganizes files with consistent naming.

Scanning and Extracting File Content

The agent needs to read content from various file types. We create extractors for the most common formats:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
from pathlib import Path
from dataclasses import dataclass
import mimetypes

@dataclass
class FileInfo:
    path: Path
    name: str
    extension: str
    size_bytes: int
    content_preview: str
    mime_type: str

def extract_text_content(filepath: Path, max_chars: int = 2000) -> str:
    """Extract text content from common file types."""
    ext = filepath.suffix.lower()

    if ext in (".txt", ".md", ".csv", ".log", ".json", ".yaml", ".yml"):
        return filepath.read_text(errors="replace")[:max_chars]

    if ext == ".pdf":
        import pymupdf
        doc = pymupdf.open(str(filepath))
        text = ""
        for page in doc:
            text += page.get_text()
            if len(text) > max_chars:
                break
        doc.close()
        return text[:max_chars]

    if ext in (".docx",):
        from docx import Document
        doc = Document(str(filepath))
        text = "\n".join(p.text for p in doc.paragraphs)
        return text[:max_chars]

    if ext in (".xlsx", ".xls"):
        import openpyxl
        wb = openpyxl.load_workbook(str(filepath), read_only=True)
        text = ""
        for sheet in wb.sheetnames[:3]:
            ws = wb[sheet]
            for row in ws.iter_rows(max_row=20, values_only=True):
                text += " ".join(str(c) for c in row if c) + "\n"
        return text[:max_chars]

    return ""

def scan_directory(directory: str, recursive: bool = True) -> list[FileInfo]:
    """Scan a directory and extract file information."""
    root = Path(directory)
    pattern = "**/*" if recursive else "*"
    files = []

    for filepath in root.glob(pattern):
        if filepath.is_file() and not filepath.name.startswith("."):
            content = extract_text_content(filepath)
            mime, _ = mimetypes.guess_type(str(filepath))
            files.append(FileInfo(
                path=filepath,
                name=filepath.name,
                extension=filepath.suffix.lower(),
                size_bytes=filepath.stat().st_size,
                content_preview=content,
                mime_type=mime or "application/octet-stream",
            ))

    return files

AI-Powered Categorization

The agent sends file metadata and content previews to an LLM for intelligent categorization. The model determines the document type, topic, and an appropriate filename:

from openai import OpenAI
import json

client = OpenAI()

CATEGORIES = {
    "contracts": "Legal agreements, NDAs, service contracts, amendments",
    "proposals": "Business proposals, RFPs, pitch decks",
    "invoices": "Invoices, receipts, purchase orders, billing statements",
    "reports": "Analytics reports, status updates, research findings",
    "correspondence": "Emails, letters, memos, meeting notes",
    "technical": "Architecture docs, API specs, runbooks, code reviews",
    "marketing": "Campaign materials, brand assets, social media content",
    "hr": "Employee records, policies, offer letters, reviews",
    "misc": "Files that do not fit other categories",
}

def categorize_file(file_info: FileInfo) -> dict:
    """Use LLM to categorize a file based on its content and metadata."""
    category_desc = "\n".join(f"- {k}: {v}" for k, v in CATEGORIES.items())

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        temperature=0,
        response_format={"type": "json_object"},
        messages=[
            {
                "role": "system",
                "content": (
                    "You categorize files. Return JSON with:\n"
                    "- category: one of the categories below\n"
                    "- subcategory: a specific subcategory (e.g., 'nda' under contracts)\n"
                    "- suggested_name: a clean descriptive filename (lowercase, hyphens, no spaces)\n"
                    "- confidence: float 0-1\n"
                    "- summary: one sentence describing the file\n\n"
                    f"Categories:\n{category_desc}"
                ),
            },
            {
                "role": "user",
                "content": (
                    f"Filename: {file_info.name}\n"
                    f"Type: {file_info.mime_type}\n"
                    f"Size: {file_info.size_bytes} bytes\n\n"
                    f"Content preview:\n{file_info.content_preview[:1500]}"
                ),
            },
        ],
    )
    return json.loads(response.choices[0].message.content)

Building the Folder Structure

The agent creates a structured folder hierarchy based on categories and subcategories:

from datetime import datetime

def build_target_path(
    base_dir: str,
    category: str,
    subcategory: str,
    suggested_name: str,
    original_ext: str,
    year: int | None = None,
) -> Path:
    """Build a target path following the folder structure convention."""
    if year is None:
        year = datetime.now().year

    target_dir = Path(base_dir) / category / subcategory / str(year)
    target_dir.mkdir(parents=True, exist_ok=True)

    filename = f"{suggested_name}{original_ext}"
    target = target_dir / filename

    # Handle name collisions
    counter = 1
    while target.exists():
        target = target_dir / f"{suggested_name}-{counter}{original_ext}"
        counter += 1

    return target

Executing the Organization Plan

Before moving files, the agent generates a plan for human review. This prevents destructive mistakes:

import shutil
import logging

logger = logging.getLogger("file_agent")

@dataclass
class FilePlan:
    source: Path
    destination: Path
    category: str
    confidence: float
    summary: str

def create_organization_plan(
    source_dir: str, target_dir: str
) -> list[FilePlan]:
    """Scan files and create an organization plan without moving anything."""
    files = scan_directory(source_dir)
    plan = []

    for file_info in files:
        result = categorize_file(file_info)
        dest = build_target_path(
            target_dir,
            result["category"],
            result.get("subcategory", "general"),
            result["suggested_name"],
            file_info.extension,
        )
        plan.append(FilePlan(
            source=file_info.path,
            destination=dest,
            category=result["category"],
            confidence=result["confidence"],
            summary=result["summary"],
        ))

    return plan

def execute_plan(plan: list[FilePlan], min_confidence: float = 0.7):
    """Execute the organization plan, moving files above the confidence threshold."""
    for item in plan:
        if item.confidence < min_confidence:
            logger.warning(f"Skipping (low confidence {item.confidence}): {item.source}")
            continue
        item.destination.parent.mkdir(parents=True, exist_ok=True)
        shutil.move(str(item.source), str(item.destination))
        logger.info(f"Moved: {item.source.name} -> {item.destination}")

The confidence threshold ensures that files the AI is unsure about remain untouched for manual review. Start with a high threshold like 0.85 and lower it as you validate accuracy.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

FAQ

How do I handle duplicate files during organization?

Compute a SHA-256 hash of each file's content before moving. Maintain a hash-to-path mapping and flag duplicates. Let the user choose which copy to keep. For near-duplicates like different versions of the same document, compare filenames and modification dates to identify the most recent version.

What about files the AI cannot read, like images or videos?

For images, use an LLM with vision capabilities to describe the content. For videos, extract metadata like duration and codec using ffprobe. Fall back to filename analysis and file extension when content extraction is impossible. These files typically end up in a media category with subcategories based on metadata.

How do I undo a batch organization if something goes wrong?

Log every move operation with source and destination paths in a JSON manifest file. To undo, read the manifest and reverse each move. This is why the plan-then-execute pattern is critical — the plan itself serves as an undo log.


#FileOrganization #AIAgents #DocumentClassification #WorkflowAutomation #Python #Automation #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Agents

Personal AI Assistant: How to Pick One for Business in 2026

A founder's guide to the personal AI assistant market: best AI assistant apps, business-grade options, and how CallSphere's voice agent fits in.

AI Agents

Free AI Agents in 2026: When Free Wins and When It Costs You

A founder's guide to free AI agents, low-code AI agent builders, and how to know when you should pay for a real platform like CallSphere.

Agentic AI

Graphiti: How Temporal Knowledge Graphs Give AI Voice Agents Persistent Memory (2026 Guide)

Graphiti is the open-source temporal knowledge graph for AI agents in 2026. Learn how bi-temporal memory beats vector RAG for voice agents and long-running LLMs.

AI Agents

Chatbot App vs ChatGPT: What's the Difference, and Which Do I Need?

Chatbot app vs ChatGPT in 2026: a founder's clear take on the difference, when to use which, and how a real AI chatbot app development works.

HVAC

Building an HVAC After-Hours Emergency Escalation System: A Complete Engineering Guide

How we built a fault-tolerant HVAC emergency triage and tech-dispatch platform on Kubernetes — three-tier CQRS, 11 micro-agents on the OpenAI Agents SDK + LangGraph, NATS JetStream, DTMF/SMS/WebSocket acceptance, circuit breakers, and an evaluation pipeline that catches regressions before they wake a tech at 3 AM.

Enterprise AI

OpenAI Frontier vs Anthropic Managed Agents: 2026 Comparison

Head-to-head: OpenAI Frontier and Anthropic's managed agent stack — strengths, fit, and what each means for enterprise AI voice and chat deployment.