Skip to content
Learn Agentic AI
Learn Agentic AI13 min read5 views

Error Handling and Retry Patterns for Playwright AI Agents

Build resilient Playwright AI agents with comprehensive error handling for timeouts, missing elements, navigation failures, and network errors, plus retry decorators and graceful degradation strategies.

Why Error Handling Is Critical for Browser Automation Agents

Browser automation is inherently unreliable. Networks fail, pages load slowly, elements appear and disappear unpredictably, and websites deploy updates that change their DOM structure without warning. An AI agent that does not handle these failures gracefully will crash on its first encounter with the real web.

Production-grade Playwright agents need layered error handling: catching specific exceptions, implementing intelligent retry logic, providing fallback strategies, and logging sufficient context for debugging. This post covers patterns that make your agents resilient.

Playwright Exception Types

Playwright raises specific exception types that tell you exactly what went wrong:

flowchart TD
    START["Error Handling and Retry Patterns for Playwright …"] --> A
    A["Why Error Handling Is Critical for Brow…"]
    A --> B
    B["Playwright Exception Types"]
    B --> C
    C["Handling Element Not Found"]
    C --> D
    D["Building a Retry Decorator"]
    D --> E
    E["Page-Level Retry with Fresh Context"]
    E --> F
    F["Graceful Degradation Pattern"]
    F --> G
    G["Timeout Configuration"]
    G --> H
    H["Comprehensive Error-Handling Agent"]
    H --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
from playwright.sync_api import (
    sync_playwright,
    TimeoutError as PlaywrightTimeout,
    Error as PlaywrightError,
)

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()

    try:
        page.goto("https://example.com", timeout=5000)
    except PlaywrightTimeout:
        print("Page took too long to load")
    except PlaywrightError as e:
        if "net::ERR_NAME_NOT_RESOLVED" in str(e):
            print("DNS resolution failed — invalid domain")
        elif "net::ERR_CONNECTION_REFUSED" in str(e):
            print("Server refused the connection")
        elif "net::ERR_CONNECTION_TIMED_OUT" in str(e):
            print("Connection timed out at network level")
        else:
            print(f"Browser error: {e}")
    except Exception as e:
        print(f"Unexpected error: {e}")
    finally:
        browser.close()

The key exceptions to handle are:

  • TimeoutError — element not found within timeout, page did not load
  • Error with network messages — DNS, connection, SSL failures
  • Error with element messages — element detached, not visible, not clickable

Handling Element Not Found

The most common failure in browser automation is trying to interact with an element that does not exist or is not ready:

def safe_click(page, selector: str, timeout: int = 5000) -> bool:
    """Click an element if it exists, return success status."""
    try:
        locator = page.locator(selector)
        locator.wait_for(state="visible", timeout=timeout)
        locator.click()
        return True
    except PlaywrightTimeout:
        print(f"Element not found: {selector}")
        return False
    except PlaywrightError as e:
        print(f"Cannot click {selector}: {e}")
        return False

def safe_fill(page, selector: str, value: str, timeout: int = 5000) -> bool:
    """Fill a form field if it exists, return success status."""
    try:
        locator = page.locator(selector)
        locator.wait_for(state="visible", timeout=timeout)
        locator.fill(value)
        return True
    except PlaywrightTimeout:
        print(f"Field not found: {selector}")
        return False

def safe_text(page, selector: str, default: str = "") -> str:
    """Extract text content safely."""
    try:
        locator = page.locator(selector)
        if locator.count() > 0:
            return locator.first.text_content() or default
        return default
    except Exception:
        return default

Building a Retry Decorator

A generic retry decorator that handles transient failures:

import time
import functools
from playwright.sync_api import TimeoutError as PlaywrightTimeout

def retry(
    max_attempts: int = 3,
    delay: float = 1.0,
    backoff: float = 2.0,
    exceptions: tuple = (PlaywrightTimeout, Exception),
):
    """
    Retry decorator with exponential backoff.

    Args:
        max_attempts: Maximum number of attempts
        delay: Initial delay between retries in seconds
        backoff: Multiplier for delay after each retry
        exceptions: Tuple of exception types to catch
    """
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            current_delay = delay
            last_exception = None

            for attempt in range(1, max_attempts + 1):
                try:
                    return func(*args, **kwargs)
                except exceptions as e:
                    last_exception = e
                    if attempt == max_attempts:
                        print(
                            f"[{func.__name__}] Failed after "
                            f"{max_attempts} attempts: {e}"
                        )
                        raise
                    print(
                        f"[{func.__name__}] Attempt {attempt} failed: {e}. "
                        f"Retrying in {current_delay:.1f}s..."
                    )
                    time.sleep(current_delay)
                    current_delay *= backoff

        return wrapper
    return decorator

# Usage
@retry(max_attempts=3, delay=2.0, backoff=2.0)
def navigate_and_extract(page, url: str) -> dict:
    page.goto(url, wait_until="networkidle", timeout=10000)
    return {
        "title": page.title(),
        "content": page.locator("main").text_content(),
    }

Page-Level Retry with Fresh Context

Sometimes the page itself gets into a bad state. Retry with a fresh browser context:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

from playwright.sync_api import sync_playwright

def robust_scrape(url: str, max_attempts: int = 3) -> dict:
    """Scrape a URL with retry logic that creates fresh contexts."""
    with sync_playwright() as p:
        browser = p.chromium.launch()

        for attempt in range(1, max_attempts + 1):
            context = browser.new_context()
            page = context.new_page()

            try:
                page.goto(url, wait_until="networkidle", timeout=15000)

                # Wait for content to be present
                page.wait_for_selector("body", timeout=5000)

                data = {
                    "url": url,
                    "title": page.title(),
                    "text": page.locator("body").text_content()[:5000],
                    "attempt": attempt,
                }
                return data

            except Exception as e:
                print(f"Attempt {attempt}/{max_attempts} failed: {e}")
                if attempt == max_attempts:
                    return {"url": url, "error": str(e)}

            finally:
                context.close()

        browser.close()

Graceful Degradation Pattern

When an agent cannot complete its primary task, fall back to progressively simpler strategies:

class ResilientAgent:
    def __init__(self, browser):
        self.browser = browser

    def extract_product_data(self, url: str) -> dict:
        """
        Try multiple strategies to extract product data,
        degrading gracefully if preferred methods fail.
        """
        context = self.browser.new_context()
        page = context.new_page()
        result = {"url": url, "strategy": None}

        try:
            page.goto(url, wait_until="networkidle", timeout=15000)

            # Strategy 1: Structured data (JSON-LD)
            try:
                json_ld = page.locator(
                    'script[type="application/ld+json"]'
                ).text_content()
                import json
                data = json.loads(json_ld)
                result.update({
                    "name": data.get("name"),
                    "price": data.get("offers", {}).get("price"),
                    "strategy": "json-ld",
                })
                return result
            except Exception:
                pass

            # Strategy 2: Open Graph meta tags
            try:
                result.update({
                    "name": page.locator(
                        'meta[property="og:title"]'
                    ).get_attribute("content"),
                    "price": None,
                    "strategy": "open-graph",
                })
                if result["name"]:
                    return result
            except Exception:
                pass

            # Strategy 3: DOM selectors (least reliable)
            try:
                result.update({
                    "name": (
                        safe_text(page, "h1")
                        or safe_text(page, ".product-title")
                    ),
                    "price": (
                        safe_text(page, ".price")
                        or safe_text(page, "[data-price]")
                    ),
                    "strategy": "dom-selectors",
                })
                return result
            except Exception:
                pass

            # Strategy 4: Take a screenshot for manual review
            page.screenshot(path=f"fallback_{hash(url)}.png")
            result.update({
                "name": page.title(),
                "price": None,
                "strategy": "screenshot-fallback",
            })
            return result

        except Exception as e:
            result["error"] = str(e)
            result["strategy"] = "failed"
            return result

        finally:
            context.close()

Timeout Configuration

Configure timeouts at different levels for fine-grained control:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()

    # Context-level default timeout (applies to all actions)
    context = browser.new_context()
    context.set_default_timeout(10000)        # 10s for actions
    context.set_default_navigation_timeout(30000)  # 30s for navigation

    page = context.new_page()

    # Page-level timeout override
    page.set_default_timeout(5000)

    # Per-action timeout (highest priority)
    page.goto("https://example.com", timeout=60000)
    page.locator("#slow-widget").wait_for(state="visible", timeout=20000)

    context.close()
    browser.close()

Timeout priority from highest to lowest: per-action > page-level > context-level > default (30 seconds).

Comprehensive Error-Handling Agent

Putting it all together in a production-ready agent:

import logging
import time
from dataclasses import dataclass
from playwright.sync_api import (
    sync_playwright,
    TimeoutError as PlaywrightTimeout,
    Error as PlaywrightError,
)

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("browser_agent")

@dataclass
class AgentResult:
    url: str
    success: bool
    data: dict | None = None
    error: str | None = None
    attempts: int = 0

class RobustBrowserAgent:
    def __init__(self, max_retries: int = 3, timeout: int = 15000):
        self.max_retries = max_retries
        self.timeout = timeout

    def execute(self, url: str, task_fn) -> AgentResult:
        with sync_playwright() as p:
            browser = p.chromium.launch()

            for attempt in range(1, self.max_retries + 1):
                context = browser.new_context()
                context.set_default_timeout(self.timeout)
                page = context.new_page()

                try:
                    logger.info(
                        f"Attempt {attempt}/{self.max_retries}: {url}"
                    )
                    page.goto(url, wait_until="networkidle")
                    data = task_fn(page)
                    return AgentResult(
                        url=url, success=True,
                        data=data, attempts=attempt,
                    )

                except PlaywrightTimeout as e:
                    logger.warning(f"Timeout on attempt {attempt}: {e}")
                    page.screenshot(
                        path=f"timeout_attempt_{attempt}.png"
                    )

                except PlaywrightError as e:
                    error_msg = str(e)
                    if "net::ERR_" in error_msg:
                        logger.error(f"Network error: {error_msg}")
                    else:
                        logger.error(f"Browser error: {error_msg}")

                except Exception as e:
                    logger.error(f"Unexpected error: {e}")

                finally:
                    context.close()

                if attempt < self.max_retries:
                    delay = 2 ** attempt
                    logger.info(f"Waiting {delay}s before retry...")
                    time.sleep(delay)

            browser.close()
            return AgentResult(
                url=url, success=False,
                error="Max retries exceeded",
                attempts=self.max_retries,
            )

# Usage
agent = RobustBrowserAgent(max_retries=3, timeout=10000)

def scrape_task(page):
    return {
        "title": page.title(),
        "heading": page.locator("h1").text_content(),
    }

result = agent.execute("https://example.com", scrape_task)
if result.success:
    print(f"Success after {result.attempts} attempt(s): {result.data}")
else:
    print(f"Failed: {result.error}")

FAQ

How should I handle CAPTCHAs in my AI agent?

CAPTCHAs are specifically designed to block automation. Options include: using CAPTCHA-solving services (like 2Captcha or Anti-Captcha), switching to an official API if the site provides one, or escalating to a human operator. Some CAPTCHAs can be avoided by using residential proxies, maintaining realistic browsing patterns, and keeping session cookies. Never attempt to bypass CAPTCHAs on sites where you do not have permission to automate.

What is the right retry count for production agents?

Three retries with exponential backoff (2s, 4s, 8s) works well for most scenarios. For critical tasks, increase to 5 retries. For bulk scraping where individual failures are acceptable, use 2 retries to optimize throughput. Always set a circuit breaker — if more than 50 percent of requests fail in a window, pause the agent and alert an operator rather than continuing to hammer a broken or blocking site.

How do I distinguish between transient and permanent failures?

Network errors (net::ERR_CONNECTION_TIMED_OUT, net::ERR_CONNECTION_RESET) are typically transient and worth retrying. DNS failures (net::ERR_NAME_NOT_RESOLVED) are usually permanent. HTTP 404 and 410 responses are permanent. HTTP 429 (rate limited) and 503 (service unavailable) are transient. Element-not-found errors may be permanent if the page structure changed, or transient if the page had not finished loading. Log the specific error type and use it to decide whether to retry.


#ErrorHandling #RetryPatterns #Playwright #Resilience #AIAgents #BrowserAutomation #FaultTolerance

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Use Cases

Automating Client Document Collection: How AI Agents Chase Missing Tax Documents and Reduce Filing Delays

See how AI agents automate tax document collection — chasing missing W-2s, 1099s, and receipts via calls and texts to eliminate the #1 CPA bottleneck.

Learn Agentic AI

Prompt Engineering for AI Agents: System Prompts, Tool Descriptions, and Few-Shot Patterns

Agent-specific prompt engineering techniques: crafting effective system prompts, writing clear tool descriptions for function calling, and few-shot examples that improve complex task performance.

Learn Agentic AI

AI Agents for IT Helpdesk: L1 Automation, Ticket Routing, and Knowledge Base Integration

Build IT helpdesk AI agents with multi-agent architecture for triage, device, network, and security issues. RAG-powered knowledge base, automated ticket creation, routing, and escalation.

Learn Agentic AI

Building Resilient AI Agents: Circuit Breakers, Retries, and Graceful Degradation

Production resilience patterns for AI agents: circuit breakers for LLM APIs, exponential backoff with jitter, fallback models, and graceful degradation strategies.

Learn Agentic AI

Computer Use in GPT-5.4: Building AI Agents That Navigate Desktop Applications

Technical guide to GPT-5.4's computer use capabilities for building AI agents that interact with desktop UIs, browser automation, and real-world application workflows.

Learn Agentic AI

Google Cloud AI Agent Trends Report 2026: Key Findings and Developer Implications

Analysis of Google Cloud's 2026 AI agent trends report covering Gemini-powered agents, Google ADK, Vertex AI agent builder, and enterprise adoption patterns.