Skip to content
Learn Agentic AI
Learn Agentic AI14 min read1 views

Building a Writing Coach Agent: Grammar, Style, and Structure Feedback

Create an AI writing coach that provides layered feedback on grammar, style, structure, and tone — with actionable revision suggestions and progress tracking across writing sessions.

Why Writing Feedback Needs Layers

Good writing feedback operates at multiple levels simultaneously. A grammar checker catches surface errors but ignores whether the argument is coherent. A structural review ensures logical flow but might miss awkward phrasing. An effective writing coach agent addresses all these layers in a prioritized way — fixing a thesis statement is more important than fixing a comma splice.

The agent provides feedback in four categories, from most impactful to least: Structure (organization and argument flow), Content (clarity of ideas and evidence), Style (voice, tone, and readability), and Mechanics (grammar, spelling, punctuation).

Feedback Data Model

Define structured feedback that organizes suggestions by category and priority:

flowchart TD
    START["Building a Writing Coach Agent: Grammar, Style, a…"] --> A
    A["Why Writing Feedback Needs Layers"]
    A --> B
    B["Feedback Data Model"]
    B --> C
    C["Readability Analysis"]
    C --> D
    D["The Multi-Layer Writing Coach"]
    D --> E
    E["Orchestrating the Review Pipeline"]
    E --> F
    F["Revision Suggestion Engine"]
    F --> G
    G["FAQ"]
    G --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
from dataclasses import dataclass, field
from enum import Enum
from typing import Optional

class FeedbackCategory(str, Enum):
    STRUCTURE = "structure"
    CONTENT = "content"
    STYLE = "style"
    MECHANICS = "mechanics"

class Severity(str, Enum):
    CRITICAL = "critical"    # Must fix: breaks understanding
    IMPORTANT = "important"  # Should fix: weakens writing
    SUGGESTION = "suggestion"  # Could improve: polish

@dataclass
class WritingIssue:
    category: FeedbackCategory
    severity: Severity
    location: str  # Paragraph or sentence reference
    original_text: str
    issue_description: str
    suggestion: str
    revised_text: Optional[str] = None
    rule_name: Optional[str] = None  # e.g., "passive_voice"

@dataclass
class WritingAnalysis:
    overall_score: float  # 0-100
    category_scores: dict[str, float] = field(default_factory=dict)
    issues: list[WritingIssue] = field(default_factory=list)
    strengths: list[str] = field(default_factory=list)
    word_count: int = 0
    readability_grade: float = 0.0
    sentence_variety_score: float = 0.0

    @property
    def critical_issues(self) -> list[WritingIssue]:
        return [i for i in self.issues if i.severity == Severity.CRITICAL]

    @property
    def issues_by_category(self) -> dict[str, list[WritingIssue]]:
        grouped: dict[str, list[WritingIssue]] = {}
        for issue in self.issues:
            cat = issue.category.value
            if cat not in grouped:
                grouped[cat] = []
            grouped[cat].append(issue)
        return grouped

Readability Analysis

Before the AI agent reviews the writing, compute quantitative metrics that inform the feedback:

import re

def compute_readability_metrics(text: str) -> dict:
    """Compute readability statistics for the text."""
    sentences = re.split(r'[.!?]+', text)
    sentences = [s.strip() for s in sentences if s.strip()]
    words = text.split()
    syllable_count = sum(count_syllables(w) for w in words)

    num_sentences = len(sentences)
    num_words = len(words)

    if num_sentences == 0 or num_words == 0:
        return {"error": "text too short to analyze"}

    # Flesch-Kincaid Grade Level
    avg_sentence_length = num_words / num_sentences
    avg_syllables_per_word = syllable_count / num_words
    fk_grade = (
        0.39 * avg_sentence_length
        + 11.8 * avg_syllables_per_word
        - 15.59
    )

    # Sentence length variety (std deviation)
    lengths = [len(s.split()) for s in sentences]
    mean_length = sum(lengths) / len(lengths)
    variance = sum((l - mean_length) ** 2 for l in lengths) / len(lengths)
    std_dev = variance ** 0.5

    # Paragraph analysis
    paragraphs = [p.strip() for p in text.split("\n\n") if p.strip()]

    return {
        "word_count": num_words,
        "sentence_count": num_sentences,
        "paragraph_count": len(paragraphs),
        "avg_sentence_length": round(avg_sentence_length, 1),
        "sentence_length_std": round(std_dev, 1),
        "flesch_kincaid_grade": round(fk_grade, 1),
        "avg_syllables_per_word": round(avg_syllables_per_word, 2),
    }

def count_syllables(word: str) -> int:
    """Rough syllable count using vowel groups."""
    word = word.lower().strip(".,!?;:'"")
    if not word:
        return 0
    vowels = "aeiouy"
    count = 0
    prev_vowel = False
    for char in word:
        is_vowel = char in vowels
        if is_vowel and not prev_vowel:
            count += 1
        prev_vowel = is_vowel
    if word.endswith("e") and count > 1:
        count -= 1
    return max(1, count)

The Multi-Layer Writing Coach

The writing coach agent operates as a pipeline of specialized reviewers, each focusing on one feedback category:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

from agents import Agent, Runner
from pydantic import BaseModel

class StructureFeedback(BaseModel):
    thesis_clear: bool
    logical_flow: bool
    paragraph_transitions: list[str]
    organization_issues: list[str]
    suggestions: list[str]

structure_reviewer = Agent(
    name="Structure Reviewer",
    instructions="""Review the writing's organizational structure.
Evaluate:

1. THESIS/MAIN IDEA: Is there a clear central argument or purpose?
   If not, suggest where and how to add one.
2. LOGICAL FLOW: Do paragraphs follow a logical progression? Flag
   any jumps in logic or missing connections.
3. TRANSITIONS: Are transitions between paragraphs smooth? Identify
   abrupt shifts.
4. PARAGRAPH UNITY: Does each paragraph focus on one main idea?
   Flag paragraphs that try to cover too much.
5. INTRODUCTION/CONCLUSION: Does the intro set up the argument?
   Does the conclusion synthesize rather than merely repeat?

Focus ONLY on structure. Ignore grammar and style issues.""",
    output_type=StructureFeedback,
)

style_reviewer = Agent(
    name="Style Reviewer",
    instructions="""Review the writing's style and voice. Evaluate:

1. ACTIVE vs PASSIVE VOICE: Flag unnecessary passive constructions.
   "The ball was thrown by John" -> "John threw the ball"
2. WORDINESS: Identify phrases that can be shortened.
   "due to the fact that" -> "because"
3. SENTENCE VARIETY: Flag sections where sentence structure is
   monotonous (e.g., five Subject-Verb-Object sentences in a row).
4. TONE CONSISTENCY: Is the tone appropriate and consistent
   throughout? Flag shifts.
5. JARGON: Flag technical terms that are not defined for the audience.

Provide specific rewrites, not just general advice.""",
)

Orchestrating the Review Pipeline

Run all reviewers in parallel and merge their feedback into a single prioritized report:

import asyncio
import json

async def full_writing_review(text: str, context: str = "") -> WritingAnalysis:
    """Run all review layers and produce a unified analysis."""
    metrics = compute_readability_metrics(text)

    prompt = f"Review this writing:\n\n{text}"
    if context:
        prompt += f"\n\nContext: {context}"

    # Run reviewers in parallel
    structure_task = Runner.run(structure_reviewer, prompt)
    style_task = Runner.run(style_reviewer, prompt)
    results = await asyncio.gather(structure_task, style_task)

    structure_result = results[0]
    style_result = results[1]

    analysis = WritingAnalysis(
        overall_score=0.0,
        word_count=metrics["word_count"],
        readability_grade=metrics["flesch_kincaid_grade"],
        sentence_variety_score=metrics["sentence_length_std"],
    )

    # Merge feedback from all reviewers and score
    # (In production, parse structured outputs into WritingIssue objects)
    analysis.overall_score = calculate_composite_score(
        metrics, structure_result, style_result
    )

    return analysis

Revision Suggestion Engine

Instead of just pointing out problems, the agent generates concrete revision options:

revision_agent = Agent(
    name="Revision Suggester",
    instructions="""Given a piece of writing and identified issues,
generate specific revision suggestions. For each issue:

1. Quote the exact original text
2. Explain what is wrong and why it matters
3. Provide 2-3 alternative phrasings ranked by quality
4. Explain why the top suggestion is best

Never rewrite the entire piece. Focus on targeted improvements
that the writer can learn from. The goal is to teach the writer
to self-edit, not to edit for them.

Format each suggestion clearly so the writer can accept or reject
individual changes.""",
)

async def get_revision_suggestions(
    text: str, issues: list[WritingIssue]
) -> str:
    issue_summary = json.dumps([
        {
            "category": i.category.value,
            "location": i.location,
            "description": i.issue_description,
            "original": i.original_text,
        }
        for i in issues[:10]  # Limit to top 10 issues
    ])

    result = await Runner.run(
        revision_agent,
        f"Writing:\n{text}\n\nIssues to address:\n{issue_summary}",
    )
    return result.final_output

FAQ

How does the agent avoid overwhelming the writer with too many issues at once?

The severity classification (critical, important, suggestion) creates a natural triage. The agent presents critical issues first — things like unclear thesis, broken logic flow, or sentences that are genuinely confusing. Style suggestions and minor mechanics come last. For first drafts, the agent might limit feedback to structure and content only, deferring style and mechanics to later revision rounds.

Can the agent adapt to different writing contexts like academic vs. business vs. creative?

Yes. The context parameter passed to the review pipeline changes the evaluation criteria. Academic writing needs formal tone, citation support, and hedged claims. Business writing prioritizes brevity and clear action items. Creative writing tolerates rule-breaking for effect. The agent's system prompt includes context-specific rules so "Use active voice" becomes a firm rule in business writing but a suggestion in creative writing.

How do you track improvement over multiple writing sessions?

Store each WritingAnalysis result with a timestamp and compare category scores over time. A student who consistently improves their structure score from 60 to 80 but plateaus on style at 55 would see the agent shift its coaching emphasis toward style. Trend visualization and session-over-session diffs help the student see concrete progress.


#WritingCoach #GrammarAnalysis #AIFeedback #Python #EducationAI #AgenticAI #LearnAI #AIEngineering

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Interview Prep

7 AI Coding Interview Questions From Anthropic, Meta & OpenAI (2026 Edition)

Real AI coding interview questions from Anthropic, Meta, and OpenAI in 2026. Includes implementing attention from scratch, Anthropic's progressive coding screens, Meta's AI-assisted round, and vector search — with solution approaches.

Learn Agentic AI

Building a Multi-Agent Data Pipeline: Ingestion, Transformation, and Analysis Agents

Build a three-agent data pipeline with ingestion, transformation, and analysis agents that process data from APIs, CSVs, and databases using Python.

Learn Agentic AI

OpenAI Agents SDK in 2026: Building Multi-Agent Systems with Handoffs and Guardrails

Complete tutorial on the OpenAI Agents SDK covering agent creation, tool definitions, handoff patterns between specialist agents, and input/output guardrails for safe AI systems.

Learn Agentic AI

Building a Research Agent with Web Search and Report Generation: Complete Tutorial

Build a research agent that searches the web, extracts and synthesizes data, and generates formatted reports using OpenAI Agents SDK and web search tools.

Learn Agentic AI

Build a Customer Support Agent from Scratch: Python, OpenAI, and Twilio in 60 Minutes

Step-by-step tutorial to build a production-ready customer support AI agent using Python FastAPI, OpenAI Agents SDK, and Twilio Voice with five integrated tools.

Learn Agentic AI

LangGraph Agent Patterns 2026: Building Stateful Multi-Step AI Workflows

Complete LangGraph tutorial covering state machines for agents, conditional edges, human-in-the-loop patterns, checkpointing, and parallel execution with full code examples.