AI Quiz Generator Agent: Creating Assessments from Any Content Source

The Problem with Manual Quiz Creation

Instructors spend hours crafting quiz questions that test the right concepts at the right difficulty level. A single well-written multiple-choice question requires identifying the key concept, writing a clear stem, creating one correct answer, and generating plausible distractors — wrong answers that would tempt a student with a specific misconception. Scaling this process across an entire course is time-consuming and error-prone.

An AI quiz generator agent automates this by analyzing source content, identifying testable concepts, and producing questions across multiple formats with calibrated difficulty. The agent does not just rephrase sentences as questions — it understands the underlying knowledge structure and generates assessments that probe genuine understanding.

Question Type Definitions

Start by defining a structured output format for different question types:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart LR
    CALLER(["Student or Parent"])
    subgraph TEL["Telephony"]
        SIP["Twilio SIP and PSTN"]
    end
    subgraph BRAIN["Education AI Agent"]
        STT["Streaming STT<br/>Deepgram or Whisper"]
        NLU{"Intent and<br/>Entity Extraction"}
        TOOLS["Tool Calls"]
        TTS["Streaming TTS<br/>ElevenLabs or Rime"]
    end
    subgraph DATA["Live Data Plane"]
        CRM[("CRM and Notes")]
        CAL[("Calendar and<br/>Schedule")]
        KB[("Knowledge Base<br/>and Policies")]
    end
    subgraph OUT["Outcomes"]
        O1(["Enrollment captured"])
        O2(["Tour scheduled"])
        O3(["Counselor callback"])
    end
    CALLER --> SIP --> STT --> NLU
    NLU -->|Lookup| TOOLS
    TOOLS <--> CRM
    TOOLS <--> CAL
    TOOLS <--> KB
    NLU --> TTS --> SIP --> CALLER
    NLU -->|Resolved| O1
    NLU -->|Schedule| O2
    NLU -->|Escalate| O3
    style CALLER fill:#f1f5f9,stroke:#64748b,color:#0f172a
    style NLU fill:#4f46e5,stroke:#4338ca,color:#fff
    style O1 fill:#059669,stroke:#047857,color:#fff
    style O2 fill:#0ea5e9,stroke:#0369a1,color:#fff
    style O3 fill:#f59e0b,stroke:#d97706,color:#1f2937

from pydantic import BaseModel, Field
from enum import Enum
from typing import Optional

class QuestionType(str, Enum):
    MULTIPLE_CHOICE = "multiple_choice"
    TRUE_FALSE = "true_false"
    SHORT_ANSWER = "short_answer"
    FILL_IN_BLANK = "fill_in_blank"

class Difficulty(str, Enum):
    RECALL = "recall"           # Remember facts
    UNDERSTANDING = "understanding"  # Explain concepts
    APPLICATION = "application"      # Apply to new situations
    ANALYSIS = "analysis"            # Break down and evaluate

class Distractor(BaseModel):
    text: str
    misconception: str = Field(
        description="The specific misconception this wrong answer targets"
    )

class QuizQuestion(BaseModel):
    question: str
    question_type: QuestionType
    difficulty: Difficulty
    correct_answer: str
    distractors: list[Distractor] = []
    explanation: str = Field(
        description="Why the correct answer is right"
    )
    source_concept: str = Field(
        description="The concept from the source material being tested"
    )
    bloom_level: str = Field(
        description="Bloom taxonomy level: remember, understand, apply, "
                    "analyze, evaluate, create"
    )

class QuizOutput(BaseModel):
    title: str
    questions: list[QuizQuestion]
    coverage_summary: str

Content Analysis Pipeline

Before generating questions, the agent needs to extract key concepts from the source material. This two-stage approach produces much better questions than generating directly from raw text:

from agents import Agent, Runner
import json

concept_extractor = Agent(
    name="Concept Extractor",
    instructions="""Analyze the provided educational content and extract
a structured list of key concepts. For each concept, identify:

1. The concept name
2. A one-sentence definition
3. Prerequisites (other concepts it depends on)
4. Common misconceptions students have about it
5. The cognitive level required to understand it (remember/understand/
   apply/analyze)

Return a JSON array of concept objects. Focus on concepts that are
testable — skip transitional phrases and meta-commentary.""",
)

async def extract_concepts(content: str) -> list[dict]:
    result = await Runner.run(
        concept_extractor,
        f"Extract testable concepts from this content:\n\n{content}",
    )
    return json.loads(result.final_output)

Distractor Generation Strategy

The quality of a multiple-choice question lives or dies on its distractors. Good distractors are plausible to a student with a specific misunderstanding but clearly wrong to a student who understands the concept:

distractor_agent = Agent(
    name="Distractor Generator",
    instructions="""You generate plausible wrong answers for
multiple-choice questions. Each distractor must:

1. Be grammatically consistent with the question stem
2. Be approximately the same length as the correct answer
3. Target a SPECIFIC misconception (document which one)
4. Never be partially correct or debatable
5. Never use absolute words like 'always' or 'never' that
   test-wise students would eliminate

For each distractor, explain the misconception it targets so
instructors can review the pedagogical reasoning.""",
)

async def generate_distractors(
    question: str, correct_answer: str, concept: dict, count: int = 3
) -> list[dict]:
    prompt = f"""Question: {question}
Correct answer: {correct_answer}
Concept: {concept['name']} — {concept['definition']}
Common misconceptions: {concept.get('misconceptions', [])}

Generate {count} distractors as a JSON array with 'text' and
'misconception' fields."""

    result = await Runner.run(distractor_agent, prompt)
    return json.loads(result.final_output)

The Quiz Generator Agent

Now combine concept extraction, question generation, and distractor creation into a single orchestrating agent:

quiz_generator = Agent(
    name="Quiz Generator",
    instructions="""You are an expert assessment designer. Given a list
of extracted concepts, generate quiz questions that:

1. Cover all major concepts from the source material
2. Mix question types (multiple choice, true/false, short answer,
   fill-in-blank)
3. Distribute difficulty across Bloom's taxonomy levels
4. Include clear explanations for correct answers
5. For multiple-choice questions, generate 3 distractors that each
   target a specific student misconception

Difficulty calibration rules:
- 40% recall/understanding questions (foundational)
- 40% application questions (intermediate)
- 20% analysis questions (challenging)

Return the quiz in the specified JSON schema.""",
    output_type=QuizOutput,
)

async def generate_quiz(
    content: str, num_questions: int = 10
) -> QuizOutput:
    # Stage 1: Extract concepts
    concepts = await extract_concepts(content)

    # Stage 2: Generate calibrated quiz
    prompt = f"""Source concepts:
{json.dumps(concepts, indent=2)}

Generate a quiz with {num_questions} questions covering these concepts.
Ensure balanced difficulty distribution and question type variety."""

    result = await Runner.run(quiz_generator, prompt)
    return result.final_output_as(QuizOutput)

Difficulty Calibration

A common failure mode is generating questions that are all the same difficulty. The agent uses Bloom's taxonomy levels as a calibration framework and validates the distribution after generation:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

def validate_difficulty_distribution(
    quiz: QuizOutput,
) -> dict[str, float]:
    counts: dict[str, int] = {}
    for q in quiz.questions:
        level = q.difficulty.value
        counts[level] = counts.get(level, 0) + 1

    total = len(quiz.questions)
    distribution = {k: v / total for k, v in counts.items()}

    # Check against target distribution
    targets = {"recall": 0.2, "understanding": 0.2,
               "application": 0.4, "analysis": 0.2}
    warnings = []
    for level, target in targets.items():
        actual = distribution.get(level, 0)
        if abs(actual - target) > 0.15:
            warnings.append(
                f"{level}: target {target:.0%}, actual {actual:.0%}"
            )

    return {"distribution": distribution, "warnings": warnings}

FAQ

How do you ensure questions test understanding rather than just rephrasing the text?

The two-stage pipeline is key. By first extracting abstract concepts and their relationships, the question generation stage works from conceptual understanding rather than surface-level text. The Bloom's taxonomy classification forces the agent to create questions at the application and analysis levels, which inherently require deeper understanding than simple recall.

Can the agent generate questions from non-text sources like videos or slides?

Yes, with a preprocessing step. For videos, pass a transcript through the concept extractor. For slides, concatenate the text content with slide context. The concept extraction stage normalizes all source formats into the same structured representation, so the question generator works identically regardless of input format.

How do you prevent duplicate or near-duplicate questions?

Add a deduplication pass after generation that computes semantic similarity between question stems using embeddings. Questions with cosine similarity above 0.85 should be flagged, and the agent can be prompted to regenerate replacements that test the same concept from a different angle.

#QuizGeneration #AssessmentAI #EducationTechnology #Python #AgenticAI #LearnAI #AIEngineering

AI Quiz Generator Agent: Creating Assessments from Any Content Source

The Problem with Manual Quiz Creation

Question Type Definitions

Content Analysis Pipeline

Distractor Generation Strategy

The Quiz Generator Agent

Difficulty Calibration

FAQ

How do you ensure questions test understanding rather than just rephrasing the text?

Can the agent generate questions from non-text sources like videos or slides?

How do you prevent duplicate or near-duplicate questions?

Try CallSphere AI Voice Agents

Related Articles You May Like

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Anthropic Skills System: Loadable Tool Packs for Claude Agents

Enterprise CIO Guide: Harvey AI — Legal Agents Move from Pilot to Practice

Enterprise CIO Guide: Perplexity Comet — The Agentic Browser Goes Mass Market

Enterprise CIO Guide: Hippocratic AI — Healthcare Agents at Scale