Skip to content
Learn Agentic AI
Learn Agentic AI14 min read2 views

Building a Meeting Notes Agent: Transcription, Summary, and Action Item Extraction

Build an AI agent that transcribes meeting audio, generates structured summaries with key decisions, extracts action items with assignees, and distributes notes to participants automatically.

Meetings Produce Value Only When Captured

The average professional spends 23 hours per week in meetings, yet most meeting outcomes evaporate within 24 hours. Without structured notes, decisions get revisited, action items fall through cracks, and absent team members miss critical context. A meeting notes agent solves this by transcribing audio, generating structured summaries, extracting action items with owners, and distributing the results to all participants.

This guide builds a complete meeting notes agent using Whisper for transcription, an LLM for intelligent summarization, and automated distribution via email or Slack.

Transcribing Audio with Whisper

The first step is converting meeting audio to text. OpenAI's Whisper API handles this with high accuracy across languages and accents:

flowchart TD
    START["Building a Meeting Notes Agent: Transcription, Su…"] --> A
    A["Meetings Produce Value Only When Captur…"]
    A --> B
    B["Transcribing Audio with Whisper"]
    B --> C
    C["Speaker Diarization"]
    C --> D
    D["Generating Structured Summaries"]
    D --> E
    E["Extracting Action Items with Assignees"]
    E --> F
    F["Distributing Meeting Notes"]
    F --> G
    G["FAQ"]
    G --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
from openai import OpenAI
from pathlib import Path
from dataclasses import dataclass

client = OpenAI()

@dataclass
class TranscriptSegment:
    start: float
    end: float
    text: str
    speaker: str = ""

def transcribe_audio(audio_path: str) -> list[TranscriptSegment]:
    """Transcribe meeting audio using Whisper with timestamps."""
    file_path = Path(audio_path)

    # Split long files into 25MB chunks (Whisper API limit)
    segments = []
    file_size = file_path.stat().st_size
    max_size = 25 * 1024 * 1024  # 25MB

    if file_size <= max_size:
        with open(audio_path, "rb") as f:
            response = client.audio.transcriptions.create(
                model="whisper-1",
                file=f,
                response_format="verbose_json",
                timestamp_granularities=["segment"],
            )
        for seg in response.segments:
            segments.append(TranscriptSegment(
                start=seg["start"],
                end=seg["end"],
                text=seg["text"].strip(),
            ))
    else:
        segments = _transcribe_chunked(audio_path, max_size)

    return segments

def _transcribe_chunked(audio_path: str, max_size: int) -> list[TranscriptSegment]:
    """Handle large audio files by splitting into chunks."""
    from pydub import AudioSegment

    audio = AudioSegment.from_file(audio_path)
    chunk_duration_ms = 10 * 60 * 1000  # 10 minutes per chunk
    segments = []
    offset = 0.0

    for i in range(0, len(audio), chunk_duration_ms):
        chunk = audio[i:i + chunk_duration_ms]
        chunk_path = f"/tmp/chunk_{i}.mp3"
        chunk.export(chunk_path, format="mp3")

        with open(chunk_path, "rb") as f:
            response = client.audio.transcriptions.create(
                model="whisper-1",
                file=f,
                response_format="verbose_json",
                timestamp_granularities=["segment"],
            )
        for seg in response.segments:
            segments.append(TranscriptSegment(
                start=seg["start"] + offset,
                end=seg["end"] + offset,
                text=seg["text"].strip(),
            ))
        offset += chunk_duration_ms / 1000

    return segments

Speaker Diarization

Knowing who said what transforms a transcript from a wall of text into a conversation. We use a simple heuristic with pyannote or a dedicated API:

def format_transcript_with_speakers(
    segments: list[TranscriptSegment],
    speaker_map: dict[str, str] | None = None,
) -> str:
    """Format transcript segments into a readable conversation."""
    if speaker_map is None:
        speaker_map = {}

    lines = []
    current_speaker = ""
    for seg in segments:
        speaker = speaker_map.get(seg.speaker, seg.speaker or "Speaker")
        timestamp = _format_time(seg.start)
        if speaker != current_speaker:
            lines.append(f"\n**{speaker}** [{timestamp}]:")
            current_speaker = speaker
        lines.append(f"  {seg.text}")

    return "\n".join(lines)

def _format_time(seconds: float) -> str:
    minutes = int(seconds // 60)
    secs = int(seconds % 60)
    return f"{minutes:02d}:{secs:02d}"

Generating Structured Summaries

The LLM transforms the raw transcript into a structured meeting summary with decisions, topics discussed, and key takeaways:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

import json

def generate_meeting_summary(transcript: str, meeting_title: str = "") -> dict:
    """Generate a structured meeting summary from transcript."""
    response = client.chat.completions.create(
        model="gpt-4o",
        temperature=0.2,
        response_format={"type": "json_object"},
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a meeting notes assistant. Analyze the transcript and "
                    "return JSON with:\n"
                    "- title: meeting title (infer from content if not provided)\n"
                    "- date: meeting date if mentioned\n"
                    "- participants: list of participant names detected\n"
                    "- executive_summary: 2-3 sentence overview\n"
                    "- topics_discussed: list of {topic, key_points, decisions_made}\n"
                    "- action_items: list of {task, assignee, deadline, priority}\n"
                    "- open_questions: list of unresolved questions\n"
                    "- next_steps: list of agreed next steps"
                ),
            },
            {
                "role": "user",
                "content": (
                    f"Meeting: {meeting_title}\n\nTranscript:\n{transcript[:12000]}"
                ),
            },
        ],
    )
    return json.loads(response.choices[0].message.content)

Truncating the transcript to 12,000 characters keeps the request within token limits for most models. For longer meetings, split the transcript into chunks and summarize each chunk before generating a final summary.

Extracting Action Items with Assignees

Action items deserve special attention because they drive follow-up. The agent extracts them with explicit assignees, deadlines, and priority levels:

def extract_action_items(summary: dict) -> list[dict]:
    """Extract and validate action items from the meeting summary."""
    items = summary.get("action_items", [])
    validated = []
    for item in items:
        validated.append({
            "task": item.get("task", ""),
            "assignee": item.get("assignee", "Unassigned"),
            "deadline": item.get("deadline", "Not specified"),
            "priority": item.get("priority", "medium"),
            "status": "pending",
        })
    return validated

def format_action_items_markdown(items: list[dict]) -> str:
    """Format action items as a Markdown checklist."""
    lines = ["## Action Items\n"]
    for item in items:
        priority_emoji = {"high": "[HIGH]", "medium": "[MED]", "low": "[LOW]"}.get(
            item["priority"], ""
        )
        lines.append(
            f"- [ ] {priority_emoji} **{item['task']}** — "
            f"Assigned to: {item['assignee']} | Due: {item['deadline']}"
        )
    return "\n".join(lines)

Distributing Meeting Notes

The agent formats the summary and sends it to participants via email or posts it to a Slack channel:

def format_meeting_notes(summary: dict, action_items: list[dict]) -> str:
    """Format complete meeting notes as Markdown."""
    notes = [f"# {summary.get('title', 'Meeting Notes')}\n"]
    notes.append(f"**Date:** {summary.get('date', 'Not specified')}")
    notes.append(f"**Participants:** {', '.join(summary.get('participants', []))}\n")
    notes.append(f"## Summary\n{summary.get('executive_summary', '')}\n")

    for topic in summary.get("topics_discussed", []):
        notes.append(f"### {topic['topic']}")
        for point in topic.get("key_points", []):
            notes.append(f"- {point}")
        if topic.get("decisions_made"):
            for decision in topic["decisions_made"]:
                notes.append(f"- **Decision:** {decision}")
        notes.append("")

    notes.append(format_action_items_markdown(action_items))

    if summary.get("open_questions"):
        notes.append("\n## Open Questions")
        for q in summary["open_questions"]:
            notes.append(f"- {q}")

    return "\n".join(notes)

def send_to_slack(webhook_url: str, notes: str):
    """Post meeting notes to a Slack channel via webhook."""
    import httpx
    httpx.post(webhook_url, json={"text": notes})

FAQ

How accurate is Whisper for meeting transcription?

Whisper achieves word error rates between 5 and 10 percent for clear English audio. Accuracy drops with heavy accents, background noise, or multiple people speaking simultaneously. For critical meetings, consider using a higher-quality microphone setup and post-processing the transcript with an LLM to fix obvious transcription errors.

How do I handle meetings longer than one hour?

Split the audio into 10-minute chunks for transcription, then concatenate the results. For summarization, generate a summary per chunk first, then feed all chunk summaries into a final summarization pass. This two-stage approach handles meetings of any length while staying within token limits.

Can the agent create tasks in project management tools?

Yes. After extracting action items, use the Jira, Asana, or Linear API to create tasks automatically. Map the assignee field to user IDs in your project management tool, set due dates from the extracted deadlines, and link back to the meeting notes for context.


#MeetingNotes #AIAgents #Transcription #Summarization #WorkflowAutomation #Python #AgenticAI #LearnAI #AIEngineering

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Use Cases

Automating Client Document Collection: How AI Agents Chase Missing Tax Documents and Reduce Filing Delays

See how AI agents automate tax document collection — chasing missing W-2s, 1099s, and receipts via calls and texts to eliminate the #1 CPA bottleneck.

AI Interview Prep

7 AI Coding Interview Questions From Anthropic, Meta & OpenAI (2026 Edition)

Real AI coding interview questions from Anthropic, Meta, and OpenAI in 2026. Includes implementing attention from scratch, Anthropic's progressive coding screens, Meta's AI-assisted round, and vector search — with solution approaches.

Learn Agentic AI

API Design for AI Agent Tool Functions: Best Practices and Anti-Patterns

How to design tool functions that LLMs can use effectively with clear naming, enum parameters, structured responses, informative error messages, and documentation.

Learn Agentic AI

AI Agents for IT Helpdesk: L1 Automation, Ticket Routing, and Knowledge Base Integration

Build IT helpdesk AI agents with multi-agent architecture for triage, device, network, and security issues. RAG-powered knowledge base, automated ticket creation, routing, and escalation.

Learn Agentic AI

Computer Use in GPT-5.4: Building AI Agents That Navigate Desktop Applications

Technical guide to GPT-5.4's computer use capabilities for building AI agents that interact with desktop UIs, browser automation, and real-world application workflows.

Learn Agentic AI

Prompt Engineering for AI Agents: System Prompts, Tool Descriptions, and Few-Shot Patterns

Agent-specific prompt engineering techniques: crafting effective system prompts, writing clear tool descriptions for function calling, and few-shot examples that improve complex task performance.