Building a Resume Screening Agent: Automated Candidate Evaluation and Shortlisting

The Resume Screening Bottleneck

A single job posting can attract hundreds of applications. Recruiters spend an average of 7 seconds per resume on initial screening — a pace that guarantees missed talent and inconsistent evaluation. An AI resume screening agent applies the same criteria to every candidate, evaluates skill matches systematically, and surfaces the strongest applicants while flagging potential bias in the process.

The critical responsibility here is fairness. An automated screening system that perpetuates bias causes more harm than a manual process because it does so at scale. This guide builds bias mitigation directly into the architecture.

Resume Parsing and Structured Extraction

The first step is converting unstructured resume text into a structured format the agent can reason about.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart LR
    PR(["PR opened"])
    UNIT["Unit tests"]
    EVAL["Eval harness<br/>PromptFoo or Braintrust"]
    GOLD[("Golden set<br/>200 tagged cases")]
    JUDGE["LLM as judge<br/>plus regex graders"]
    SCORE["Aggregate score<br/>and per slice"]
    GATE{"Score regress<br/>more than 2 percent?"}
    BLOCK(["Block merge"])
    MERGE(["Merge to main"])
    PR --> UNIT --> EVAL --> GOLD --> JUDGE --> SCORE --> GATE
    GATE -->|Yes| BLOCK
    GATE -->|No| MERGE
    style EVAL fill:#4f46e5,stroke:#4338ca,color:#fff
    style GATE fill:#f59e0b,stroke:#d97706,color:#1f2937
    style BLOCK fill:#dc2626,stroke:#b91c1c,color:#fff
    style MERGE fill:#059669,stroke:#047857,color:#fff

from dataclasses import dataclass, field
from typing import Optional
from agents import Agent, Runner, function_tool
import json
import re

@dataclass
class ParsedResume:
    candidate_id: str
    name: str
    email: str
    skills: list[str]
    experience_entries: list[dict]  # role, company, duration_months, description
    education: list[dict]  # degree, institution, year
    certifications: list[str]
    total_experience_years: float

@dataclass
class JobCriteria:
    job_id: str
    required_skills: list[str]
    preferred_skills: list[str]
    min_experience_years: int
    required_education: str  # "bachelor", "master", "none"
    required_certifications: list[str]
    weight_skills: float = 0.4
    weight_experience: float = 0.3
    weight_education: float = 0.15
    weight_certifications: float = 0.15

PARSED_RESUMES: dict[str, ParsedResume] = {}
JOB_CRITERIA_DB: dict[str, JobCriteria] = {}

Candidate Scoring Engine

The scoring tool evaluates each candidate against explicit, weighted criteria. Each dimension produces a normalized score between 0 and 1.

def _calculate_skill_score(
    candidate_skills: list[str],
    required: list[str],
    preferred: list[str],
) -> tuple[float, list[str], list[str]]:
    """Score skill match and return matched/missing skills."""
    candidate_lower = {s.lower() for s in candidate_skills}
    required_lower = {s.lower() for s in required}
    preferred_lower = {s.lower() for s in preferred}

    required_matches = candidate_lower & required_lower
    preferred_matches = candidate_lower & preferred_lower
    missing_required = required_lower - candidate_lower

    if not required_lower:
        score = 1.0
    else:
        required_ratio = len(required_matches) / len(required_lower)
        preferred_bonus = (
            len(preferred_matches) / len(preferred_lower) * 0.2
            if preferred_lower else 0
        )
        score = min(required_ratio + preferred_bonus, 1.0)

    return score, list(required_matches | preferred_matches), list(missing_required)

@function_tool
def score_candidate(candidate_id: str, job_id: str) -> str:
    """Score a candidate against job criteria with detailed breakdown."""
    resume = PARSED_RESUMES.get(candidate_id)
    criteria = JOB_CRITERIA_DB.get(job_id)

    if not resume:
        return json.dumps({"error": "Candidate resume not found"})
    if not criteria:
        return json.dumps({"error": "Job criteria not found"})

    # Skill scoring
    skill_score, matched_skills, missing = _calculate_skill_score(
        resume.skills, criteria.required_skills, criteria.preferred_skills
    )

    # Experience scoring
    exp_ratio = resume.total_experience_years / max(criteria.min_experience_years, 1)
    experience_score = min(exp_ratio, 1.0)

    # Education scoring
    edu_levels = {"none": 0, "associate": 1, "bachelor": 2, "master": 3, "phd": 4}
    candidate_edu = max(
        (edu_levels.get(e.get("degree", "").lower(), 0) for e in resume.education),
        default=0,
    )
    required_edu = edu_levels.get(criteria.required_education.lower(), 0)
    education_score = 1.0 if candidate_edu >= required_edu else 0.5

    # Certification scoring
    if criteria.required_certifications:
        cert_lower = {c.lower() for c in resume.certifications}
        req_cert_lower = {c.lower() for c in criteria.required_certifications}
        cert_score = len(cert_lower & req_cert_lower) / len(req_cert_lower)
    else:
        cert_score = 1.0

    # Weighted total
    total = (
        skill_score * criteria.weight_skills
        + experience_score * criteria.weight_experience
        + education_score * criteria.weight_education
        + cert_score * criteria.weight_certifications
    )

    return json.dumps({
        "candidate_id": candidate_id,
        "overall_score": round(total * 100),
        "breakdown": {
            "skills": {"score": round(skill_score * 100), "matched": matched_skills, "missing": missing},
            "experience": {"score": round(experience_score * 100), "years": resume.total_experience_years},
            "education": {"score": round(education_score * 100)},
            "certifications": {"score": round(cert_score * 100)},
        },
        "recommendation": "advance" if total >= 0.7 else "review" if total >= 0.5 else "decline",
    })

Bias Mitigation Tools

Bias mitigation is not an afterthought — it is a core system requirement.

@function_tool
def run_bias_audit(job_id: str, scored_candidates: str) -> str:
    """Audit a batch of scored candidates for potential bias indicators."""
    candidates = json.loads(scored_candidates)

    audit_checks = {
        "criteria_objectivity": True,
        "name_blind_scoring": True,
        "education_prestige_excluded": True,
        "gap_penalty_removed": True,
    }

    criteria = JOB_CRITERIA_DB.get(job_id)
    if criteria:
        subjective_terms = {"culture fit", "communication style", "personality"}
        all_skills = set(s.lower() for s in criteria.required_skills + criteria.preferred_skills)
        if all_skills & subjective_terms:
            audit_checks["criteria_objectivity"] = False

    flagged = [c for c in audit_checks if not audit_checks[c]]
    return json.dumps({
        "audit_passed": len(flagged) == 0,
        "checks": audit_checks,
        "flagged_issues": flagged,
        "recommendation": "Review flagged criteria before finalizing shortlist"
                          if flagged else "No bias indicators detected",
    })

screening_agent = Agent(
    name="ScreenBot",
    instructions="""You are ScreenBot, a resume screening assistant.
Evaluate candidates strictly against stated job criteria.
Never factor in candidate names, personal demographics, or school prestige.
Always run a bias audit before finalizing any shortlist.
Present results as scored rankings with clear justification for each score.""",
    tools=[score_candidate, run_bias_audit],
)

FAQ

How do you handle candidates who have relevant experience but use different terminology?

Implement a skills synonym mapping that normalizes variations. For example, "React.js", "ReactJS", and "React" should all map to the same skill. The skill matching function should compare against normalized forms rather than raw strings.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

What legal considerations apply to automated resume screening?

Several jurisdictions require disclosure when AI is used in hiring decisions. New York City's Local Law 144, for instance, mandates annual bias audits for automated employment decision tools. Always consult legal counsel, provide candidate opt-out options, and maintain human oversight for final hiring decisions.

Should the agent completely replace human recruiters?

No. The agent should shortlist and rank candidates, but a human recruiter should review the shortlist before candidates are advanced or rejected. The agent accelerates the process and improves consistency, but human judgment remains essential for nuanced evaluation of career narratives and potential.

#ResumeScreening #CandidateEvaluation #HiringAutomation #BiasMitigation #AgenticAI #LearnAI #AIEngineering

Building a Resume Screening Agent: Automated Candidate Evaluation and Shortlisting

The Resume Screening Bottleneck

Resume Parsing and Structured Extraction

Candidate Scoring Engine

Bias Mitigation Tools

FAQ

How do you handle candidates who have relevant experience but use different terminology?

What legal considerations apply to automated resume screening?

Should the agent completely replace human recruiters?

Try CallSphere AI Voice Agents

Related Articles You May Like

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

Anthropic Skills System: Loadable Tool Packs for Claude Agents

Enterprise CIO Guide: Harvey AI — Legal Agents Move from Pilot to Practice

Enterprise CIO Guide: Perplexity Comet — The Agentic Browser Goes Mass Market

Enterprise CIO Guide: Hippocratic AI — Healthcare Agents at Scale

Designing Agent Loops with the Claude Agent SDK