Skip to content
AI Quiz Generator Agent: Creating Assessments from Any Content Source
Learn Agentic AI13 min read10 views

AI Quiz Generator Agent: Creating Assessments from Any Content Source

Build an AI agent that analyzes text, lectures, or documents and automatically generates multiple-choice, short-answer, and true/false questions with calibrated difficulty levels.

The Problem with Manual Quiz Creation

Instructors spend hours crafting quiz questions that test the right concepts at the right difficulty level. A single well-written multiple-choice question requires identifying the key concept, writing a clear stem, creating one correct answer, and generating plausible distractors — wrong answers that would tempt a student with a specific misconception. Scaling this process across an entire course is time-consuming and error-prone.

An AI quiz generator agent automates this by analyzing source content, identifying testable concepts, and producing questions across multiple formats with calibrated difficulty. The agent does not just rephrase sentences as questions — it understands the underlying knowledge structure and generates assessments that probe genuine understanding.

Question Type Definitions

Start by defining a structured output format for different question types:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
flowchart LR
    CALLER(["Student or Parent"])
    subgraph TEL["Telephony"]
        SIP["Twilio SIP and PSTN"]
    end
    subgraph BRAIN["Education AI Agent"]
        STT["Streaming STT<br/>Deepgram or Whisper"]
        NLU{"Intent and<br/>Entity Extraction"}
        TOOLS["Tool Calls"]
        TTS["Streaming TTS<br/>ElevenLabs or Rime"]
    end
    subgraph DATA["Live Data Plane"]
        CRM[("CRM and Notes")]
        CAL[("Calendar and<br/>Schedule")]
        KB[("Knowledge Base<br/>and Policies")]
    end
    subgraph OUT["Outcomes"]
        O1(["Enrollment captured"])
        O2(["Tour scheduled"])
        O3(["Counselor callback"])
    end
    CALLER --> SIP --> STT --> NLU
    NLU -->|Lookup| TOOLS
    TOOLS <--> CRM
    TOOLS <--> CAL
    TOOLS <--> KB
    NLU --> TTS --> SIP --> CALLER
    NLU -->|Resolved| O1
    NLU -->|Schedule| O2
    NLU -->|Escalate| O3
    style CALLER fill:#f1f5f9,stroke:#64748b,color:#0f172a
    style NLU fill:#4f46e5,stroke:#4338ca,color:#fff
    style O1 fill:#059669,stroke:#047857,color:#fff
    style O2 fill:#0ea5e9,stroke:#0369a1,color:#fff
    style O3 fill:#f59e0b,stroke:#d97706,color:#1f2937
from pydantic import BaseModel, Field
from enum import Enum
from typing import Optional

class QuestionType(str, Enum):
    MULTIPLE_CHOICE = "multiple_choice"
    TRUE_FALSE = "true_false"
    SHORT_ANSWER = "short_answer"
    FILL_IN_BLANK = "fill_in_blank"

class Difficulty(str, Enum):
    RECALL = "recall"           # Remember facts
    UNDERSTANDING = "understanding"  # Explain concepts
    APPLICATION = "application"      # Apply to new situations
    ANALYSIS = "analysis"            # Break down and evaluate

class Distractor(BaseModel):
    text: str
    misconception: str = Field(
        description="The specific misconception this wrong answer targets"
    )

class QuizQuestion(BaseModel):
    question: str
    question_type: QuestionType
    difficulty: Difficulty
    correct_answer: str
    distractors: list[Distractor] = []
    explanation: str = Field(
        description="Why the correct answer is right"
    )
    source_concept: str = Field(
        description="The concept from the source material being tested"
    )
    bloom_level: str = Field(
        description="Bloom taxonomy level: remember, understand, apply, "
                    "analyze, evaluate, create"
    )

class QuizOutput(BaseModel):
    title: str
    questions: list[QuizQuestion]
    coverage_summary: str

Content Analysis Pipeline

Before generating questions, the agent needs to extract key concepts from the source material. This two-stage approach produces much better questions than generating directly from raw text:

from agents import Agent, Runner
import json

concept_extractor = Agent(
    name="Concept Extractor",
    instructions="""Analyze the provided educational content and extract
a structured list of key concepts. For each concept, identify:

1. The concept name
2. A one-sentence definition
3. Prerequisites (other concepts it depends on)
4. Common misconceptions students have about it
5. The cognitive level required to understand it (remember/understand/
   apply/analyze)

Return a JSON array of concept objects. Focus on concepts that are
testable — skip transitional phrases and meta-commentary.""",
)

async def extract_concepts(content: str) -> list[dict]:
    result = await Runner.run(
        concept_extractor,
        f"Extract testable concepts from this content:\n\n{content}",
    )
    return json.loads(result.final_output)

Distractor Generation Strategy

The quality of a multiple-choice question lives or dies on its distractors. Good distractors are plausible to a student with a specific misunderstanding but clearly wrong to a student who understands the concept:

distractor_agent = Agent(
    name="Distractor Generator",
    instructions="""You generate plausible wrong answers for
multiple-choice questions. Each distractor must:

1. Be grammatically consistent with the question stem
2. Be approximately the same length as the correct answer
3. Target a SPECIFIC misconception (document which one)
4. Never be partially correct or debatable
5. Never use absolute words like 'always' or 'never' that
   test-wise students would eliminate

For each distractor, explain the misconception it targets so
instructors can review the pedagogical reasoning.""",
)

async def generate_distractors(
    question: str, correct_answer: str, concept: dict, count: int = 3
) -> list[dict]:
    prompt = f"""Question: {question}
Correct answer: {correct_answer}
Concept: {concept['name']} — {concept['definition']}
Common misconceptions: {concept.get('misconceptions', [])}

Generate {count} distractors as a JSON array with 'text' and
'misconception' fields."""

    result = await Runner.run(distractor_agent, prompt)
    return json.loads(result.final_output)

The Quiz Generator Agent

Now combine concept extraction, question generation, and distractor creation into a single orchestrating agent:

quiz_generator = Agent(
    name="Quiz Generator",
    instructions="""You are an expert assessment designer. Given a list
of extracted concepts, generate quiz questions that:

1. Cover all major concepts from the source material
2. Mix question types (multiple choice, true/false, short answer,
   fill-in-blank)
3. Distribute difficulty across Bloom's taxonomy levels
4. Include clear explanations for correct answers
5. For multiple-choice questions, generate 3 distractors that each
   target a specific student misconception

Difficulty calibration rules:
- 40% recall/understanding questions (foundational)
- 40% application questions (intermediate)
- 20% analysis questions (challenging)

Return the quiz in the specified JSON schema.""",
    output_type=QuizOutput,
)

async def generate_quiz(
    content: str, num_questions: int = 10
) -> QuizOutput:
    # Stage 1: Extract concepts
    concepts = await extract_concepts(content)

    # Stage 2: Generate calibrated quiz
    prompt = f"""Source concepts:
{json.dumps(concepts, indent=2)}

Generate a quiz with {num_questions} questions covering these concepts.
Ensure balanced difficulty distribution and question type variety."""

    result = await Runner.run(quiz_generator, prompt)
    return result.final_output_as(QuizOutput)

Difficulty Calibration

A common failure mode is generating questions that are all the same difficulty. The agent uses Bloom's taxonomy levels as a calibration framework and validates the distribution after generation:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

def validate_difficulty_distribution(
    quiz: QuizOutput,
) -> dict[str, float]:
    counts: dict[str, int] = {}
    for q in quiz.questions:
        level = q.difficulty.value
        counts[level] = counts.get(level, 0) + 1

    total = len(quiz.questions)
    distribution = {k: v / total for k, v in counts.items()}

    # Check against target distribution
    targets = {"recall": 0.2, "understanding": 0.2,
               "application": 0.4, "analysis": 0.2}
    warnings = []
    for level, target in targets.items():
        actual = distribution.get(level, 0)
        if abs(actual - target) > 0.15:
            warnings.append(
                f"{level}: target {target:.0%}, actual {actual:.0%}"
            )

    return {"distribution": distribution, "warnings": warnings}

FAQ

How do you ensure questions test understanding rather than just rephrasing the text?

The two-stage pipeline is key. By first extracting abstract concepts and their relationships, the question generation stage works from conceptual understanding rather than surface-level text. The Bloom's taxonomy classification forces the agent to create questions at the application and analysis levels, which inherently require deeper understanding than simple recall.

Can the agent generate questions from non-text sources like videos or slides?

Yes, with a preprocessing step. For videos, pass a transcript through the concept extractor. For slides, concatenate the text content with slide context. The concept extraction stage normalizes all source formats into the same structured representation, so the question generator works identically regardless of input format.

How do you prevent duplicate or near-duplicate questions?

Add a deduplication pass after generation that computes semantic similarity between question stems using embeddings. Questions with cosine similarity above 0.85 should be flagged, and the agent can be prompted to regenerate replacements that test the same concept from a different angle.


#QuizGeneration #AssessmentAI #EducationTechnology #Python #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.