Skip to content
Building a Meeting Notes Agent: Transcription, Summary, and Action Item Extraction
Learn Agentic AI14 min read31 views

Building a Meeting Notes Agent: Transcription, Summary, and Action Item Extraction

Build an AI agent that transcribes meeting audio, generates structured summaries with key decisions, extracts action items with assignees, and distributes notes to participants automatically.

Meetings Produce Value Only When Captured

The average professional spends 23 hours per week in meetings, yet most meeting outcomes evaporate within 24 hours. Without structured notes, decisions get revisited, action items fall through cracks, and absent team members miss critical context. A meeting notes agent solves this by transcribing audio, generating structured summaries, extracting action items with owners, and distributing the results to all participants.

This guide builds a complete meeting notes agent using Whisper for transcription, an LLM for intelligent summarization, and automated distribution via email or Slack.

Transcribing Audio with Whisper

The first step is converting meeting audio to text. OpenAI's Whisper API handles this with high accuracy across languages and accents:

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
from openai import OpenAI
from pathlib import Path
from dataclasses import dataclass

client = OpenAI()

@dataclass
class TranscriptSegment:
    start: float
    end: float
    text: str
    speaker: str = ""

def transcribe_audio(audio_path: str) -> list[TranscriptSegment]:
    """Transcribe meeting audio using Whisper with timestamps."""
    file_path = Path(audio_path)

    # Split long files into 25MB chunks (Whisper API limit)
    segments = []
    file_size = file_path.stat().st_size
    max_size = 25 * 1024 * 1024  # 25MB

    if file_size <= max_size:
        with open(audio_path, "rb") as f:
            response = client.audio.transcriptions.create(
                model="whisper-1",
                file=f,
                response_format="verbose_json",
                timestamp_granularities=["segment"],
            )
        for seg in response.segments:
            segments.append(TranscriptSegment(
                start=seg["start"],
                end=seg["end"],
                text=seg["text"].strip(),
            ))
    else:
        segments = _transcribe_chunked(audio_path, max_size)

    return segments

def _transcribe_chunked(audio_path: str, max_size: int) -> list[TranscriptSegment]:
    """Handle large audio files by splitting into chunks."""
    from pydub import AudioSegment

    audio = AudioSegment.from_file(audio_path)
    chunk_duration_ms = 10 * 60 * 1000  # 10 minutes per chunk
    segments = []
    offset = 0.0

    for i in range(0, len(audio), chunk_duration_ms):
        chunk = audio[i:i + chunk_duration_ms]
        chunk_path = f"/tmp/chunk_{i}.mp3"
        chunk.export(chunk_path, format="mp3")

        with open(chunk_path, "rb") as f:
            response = client.audio.transcriptions.create(
                model="whisper-1",
                file=f,
                response_format="verbose_json",
                timestamp_granularities=["segment"],
            )
        for seg in response.segments:
            segments.append(TranscriptSegment(
                start=seg["start"] + offset,
                end=seg["end"] + offset,
                text=seg["text"].strip(),
            ))
        offset += chunk_duration_ms / 1000

    return segments

Speaker Diarization

Knowing who said what transforms a transcript from a wall of text into a conversation. We use a simple heuristic with pyannote or a dedicated API:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
def format_transcript_with_speakers(
    segments: list[TranscriptSegment],
    speaker_map: dict[str, str] | None = None,
) -> str:
    """Format transcript segments into a readable conversation."""
    if speaker_map is None:
        speaker_map = {}

    lines = []
    current_speaker = ""
    for seg in segments:
        speaker = speaker_map.get(seg.speaker, seg.speaker or "Speaker")
        timestamp = _format_time(seg.start)
        if speaker != current_speaker:
            lines.append(f"\n**{speaker}** [{timestamp}]:")
            current_speaker = speaker
        lines.append(f"  {seg.text}")

    return "\n".join(lines)

def _format_time(seconds: float) -> str:
    minutes = int(seconds // 60)
    secs = int(seconds % 60)
    return f"{minutes:02d}:{secs:02d}"

Generating Structured Summaries

The LLM transforms the raw transcript into a structured meeting summary with decisions, topics discussed, and key takeaways:

import json

def generate_meeting_summary(transcript: str, meeting_title: str = "") -> dict:
    """Generate a structured meeting summary from transcript."""
    response = client.chat.completions.create(
        model="gpt-4o",
        temperature=0.2,
        response_format={"type": "json_object"},
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a meeting notes assistant. Analyze the transcript and "
                    "return JSON with:\n"
                    "- title: meeting title (infer from content if not provided)\n"
                    "- date: meeting date if mentioned\n"
                    "- participants: list of participant names detected\n"
                    "- executive_summary: 2-3 sentence overview\n"
                    "- topics_discussed: list of {topic, key_points, decisions_made}\n"
                    "- action_items: list of {task, assignee, deadline, priority}\n"
                    "- open_questions: list of unresolved questions\n"
                    "- next_steps: list of agreed next steps"
                ),
            },
            {
                "role": "user",
                "content": (
                    f"Meeting: {meeting_title}\n\nTranscript:\n{transcript[:12000]}"
                ),
            },
        ],
    )
    return json.loads(response.choices[0].message.content)

Truncating the transcript to 12,000 characters keeps the request within token limits for most models. For longer meetings, split the transcript into chunks and summarize each chunk before generating a final summary.

Extracting Action Items with Assignees

Action items deserve special attention because they drive follow-up. The agent extracts them with explicit assignees, deadlines, and priority levels:

def extract_action_items(summary: dict) -> list[dict]:
    """Extract and validate action items from the meeting summary."""
    items = summary.get("action_items", [])
    validated = []
    for item in items:
        validated.append({
            "task": item.get("task", ""),
            "assignee": item.get("assignee", "Unassigned"),
            "deadline": item.get("deadline", "Not specified"),
            "priority": item.get("priority", "medium"),
            "status": "pending",
        })
    return validated

def format_action_items_markdown(items: list[dict]) -> str:
    """Format action items as a Markdown checklist."""
    lines = ["## Action Items\n"]
    for item in items:
        priority_emoji = {"high": "[HIGH]", "medium": "[MED]", "low": "[LOW]"}.get(
            item["priority"], ""
        )
        lines.append(
            f"- [ ] {priority_emoji} **{item['task']}** — "
            f"Assigned to: {item['assignee']} | Due: {item['deadline']}"
        )
    return "\n".join(lines)

Distributing Meeting Notes

The agent formats the summary and sends it to participants via email or posts it to a Slack channel:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

def format_meeting_notes(summary: dict, action_items: list[dict]) -> str:
    """Format complete meeting notes as Markdown."""
    notes = [f"# {summary.get('title', 'Meeting Notes')}\n"]
    notes.append(f"**Date:** {summary.get('date', 'Not specified')}")
    notes.append(f"**Participants:** {', '.join(summary.get('participants', []))}\n")
    notes.append(f"## Summary\n{summary.get('executive_summary', '')}\n")

    for topic in summary.get("topics_discussed", []):
        notes.append(f"### {topic['topic']}")
        for point in topic.get("key_points", []):
            notes.append(f"- {point}")
        if topic.get("decisions_made"):
            for decision in topic["decisions_made"]:
                notes.append(f"- **Decision:** {decision}")
        notes.append("")

    notes.append(format_action_items_markdown(action_items))

    if summary.get("open_questions"):
        notes.append("\n## Open Questions")
        for q in summary["open_questions"]:
            notes.append(f"- {q}")

    return "\n".join(notes)

def send_to_slack(webhook_url: str, notes: str):
    """Post meeting notes to a Slack channel via webhook."""
    import httpx
    httpx.post(webhook_url, json={"text": notes})

FAQ

How accurate is Whisper for meeting transcription?

Whisper achieves word error rates between 5 and 10 percent for clear English audio. Accuracy drops with heavy accents, background noise, or multiple people speaking simultaneously. For critical meetings, consider using a higher-quality microphone setup and post-processing the transcript with an LLM to fix obvious transcription errors.

How do I handle meetings longer than one hour?

Split the audio into 10-minute chunks for transcription, then concatenate the results. For summarization, generate a summary per chunk first, then feed all chunk summaries into a final summarization pass. This two-stage approach handles meetings of any length while staying within token limits.

Can the agent create tasks in project management tools?

Yes. After extracting action items, use the Jira, Asana, or Linear API to create tasks automatically. Map the assignee field to user IDs in your project management tool, set due dates from the extracted deadlines, and link back to the meeting notes for context.


#MeetingNotes #AIAgents #Transcription #Summarization #WorkflowAutomation #Python #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Agents

Personal AI Assistant: How to Pick One for Business in 2026

A founder's guide to the personal AI assistant market: best AI assistant apps, business-grade options, and how CallSphere's voice agent fits in.

AI Agents

Free AI Agents in 2026: When Free Wins and When It Costs You

A founder's guide to free AI agents, low-code AI agent builders, and how to know when you should pay for a real platform like CallSphere.

Agentic AI

Graphiti: How Temporal Knowledge Graphs Give AI Voice Agents Persistent Memory (2026 Guide)

Graphiti is the open-source temporal knowledge graph for AI agents in 2026. Learn how bi-temporal memory beats vector RAG for voice agents and long-running LLMs.

AI Agents

Chatbot App vs ChatGPT: What's the Difference, and Which Do I Need?

Chatbot app vs ChatGPT in 2026: a founder's clear take on the difference, when to use which, and how a real AI chatbot app development works.

HVAC

Building an HVAC After-Hours Emergency Escalation System: A Complete Engineering Guide

How we built a fault-tolerant HVAC emergency triage and tech-dispatch platform on Kubernetes — three-tier CQRS, 11 micro-agents on the OpenAI Agents SDK + LangGraph, NATS JetStream, DTMF/SMS/WebSocket acceptance, circuit breakers, and an evaluation pipeline that catches regressions before they wake a tech at 3 AM.

Enterprise AI

OpenAI Frontier vs Anthropic Managed Agents: 2026 Comparison

Head-to-head: OpenAI Frontier and Anthropic's managed agent stack — strengths, fit, and what each means for enterprise AI voice and chat deployment.