---
title: "Building a Price Monitoring Agent: Automated Price Tracking Across E-Commerce Sites"
description: "Build a production-grade price monitoring agent that scrapes multiple e-commerce sites, extracts prices with AI, detects changes, sends alerts, and maintains a historical price database for trend analysis."
canonical: https://callsphere.ai/blog/building-price-monitoring-agent-ecommerce-tracking
category: "Learn Agentic AI"
tags: ["Price Monitoring", "Web Scraping", "E-Commerce", "Automation", "AI Agents", "Data Extraction"]
author: "CallSphere Team"
published: 2026-03-18T00:00:00.000Z
updated: 2026-05-06T01:02:46.077Z
---

# Building a Price Monitoring Agent: Automated Price Tracking Across E-Commerce Sites

> Build a production-grade price monitoring agent that scrapes multiple e-commerce sites, extracts prices with AI, detects changes, sends alerts, and maintains a historical price database for trend analysis.

## Why Price Monitoring Needs AI

Traditional price scrapers rely on CSS selectors or XPath expressions to extract price values from product pages. This works until the site redesigns its layout, introduces dynamic pricing loaded via JavaScript, or renders prices inside images. AI-powered price monitoring agents solve these problems by using language models to interpret page content semantically rather than structurally.

A production price monitoring agent needs five capabilities: multi-site scraping with site-specific adapters, intelligent price extraction that handles edge cases, change detection with configurable thresholds, alerting through multiple channels, and historical price storage for trend analysis.

## Core Data Model

Start with a clean data model that separates the concepts of products, price snapshots, and alert rules.

```mermaid
flowchart LR
    CALLER(["Shopper"])
    subgraph TEL["Telephony"]
        SIP["Twilio SIP and PSTN"]
    end
    subgraph BRAIN["E-commerce AI Agent"]
        STT["Streaming STT
Deepgram or Whisper"]
        NLU{"Intent and
Entity Extraction"}
        TOOLS["Tool Calls"]
        TTS["Streaming TTS
ElevenLabs or Rime"]
    end
    subgraph DATA["Live Data Plane"]
        CRM[("CRM and Notes")]
        CAL[("Calendar and
Schedule")]
        KB[("Knowledge Base
and Policies")]
    end
    subgraph OUT["Outcomes"]
        O1(["Order status answered"])
        O2(["Return RMA created"])
        O3(["Specialist handoff"])
    end
    CALLER --> SIP --> STT --> NLU
    NLU -->|Lookup| TOOLS
    TOOLS  CRM
    TOOLS  CAL
    TOOLS  KB
    NLU --> TTS --> SIP --> CALLER
    NLU -->|Resolved| O1
    NLU -->|Schedule| O2
    NLU -->|Escalate| O3
    style CALLER fill:#f1f5f9,stroke:#64748b,color:#0f172a
    style NLU fill:#4f46e5,stroke:#4338ca,color:#fff
    style O1 fill:#059669,stroke:#047857,color:#fff
    style O2 fill:#0ea5e9,stroke:#0369a1,color:#fff
    style O3 fill:#f59e0b,stroke:#d97706,color:#1f2937
```

```python
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional
import sqlite3
import json

@dataclass
class Product:
    id: str
    name: str
    url: str
    site: str
    current_price: Optional[float] = None
    currency: str = "USD"
    last_checked: Optional[datetime] = None

@dataclass
class PriceSnapshot:
    product_id: str
    price: float
    currency: str
    timestamp: datetime
    raw_text: str = ""

@dataclass
class AlertRule:
    product_id: str
    condition: str  # "drop_below", "drop_percent", "any_change"
    threshold: float = 0.0
    notify_channels: list[str] = field(
        default_factory=lambda: ["email"]
    )

class PriceDatabase:
    def __init__(self, db_path: str = "prices.db"):
        self.conn = sqlite3.connect(db_path)
        self._init_tables()

    def _init_tables(self):
        self.conn.executescript("""
            CREATE TABLE IF NOT EXISTS products (
                id TEXT PRIMARY KEY,
                name TEXT NOT NULL,
                url TEXT NOT NULL,
                site TEXT NOT NULL,
                current_price REAL,
                currency TEXT DEFAULT 'USD',
                last_checked TEXT
            );
            CREATE TABLE IF NOT EXISTS price_snapshots (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                product_id TEXT NOT NULL,
                price REAL NOT NULL,
                currency TEXT NOT NULL,
                timestamp TEXT NOT NULL,
                raw_text TEXT,
                FOREIGN KEY (product_id) REFERENCES products(id)
            );
            CREATE INDEX IF NOT EXISTS idx_snapshots_product_time
                ON price_snapshots(product_id, timestamp);
        """)

    def record_price(self, snapshot: PriceSnapshot):
        self.conn.execute(
            "INSERT INTO price_snapshots "
            "(product_id, price, currency, timestamp, raw_text) "
            "VALUES (?, ?, ?, ?, ?)",
            (snapshot.product_id, snapshot.price, snapshot.currency,
             snapshot.timestamp.isoformat(), snapshot.raw_text),
        )
        self.conn.execute(
            "UPDATE products SET current_price = ?, last_checked = ? "
            "WHERE id = ?",
            (snapshot.price, snapshot.timestamp.isoformat(),
             snapshot.product_id),
        )
        self.conn.commit()
```

## AI-Powered Price Extraction

The key differentiator of an AI-powered price monitor is its ability to extract prices from any page without hand-crafted selectors. The agent sends the page content to an LLM and asks it to identify the current selling price.

```python
from openai import AsyncOpenAI
import re

class AIPriceExtractor:
    def __init__(self, client: AsyncOpenAI):
        self.client = client

    async def extract_price(self, page_text: str,
                             product_name: str) -> dict:
        """Extract price from page text using LLM."""
        response = await self.client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": (
                    "Extract the current selling price from the "
                    "product page text. Return JSON with keys: "
                    "price (float), currency (string), "
                    "original_text (the raw price string). "
                    "If there is a sale price, use the sale price."
                )},
                {"role": "user", "content": (
                    f"Product: {product_name}\n\n"
                    f"Page content:\n{page_text[:3000]}"
                )},
            ],
            response_format={"type": "json_object"},
            temperature=0,
        )

        result = json.loads(response.choices[0].message.content)
        return result

    def parse_price_fallback(self, text: str) -> Optional[float]:
        """Regex fallback when LLM is unavailable."""
        patterns = [
            r'$[d,]+.?d*',
            r'USDs*[d,]+.?d*',
            r'Price:s*[d,]+.?d*',
        ]
        for pattern in patterns:
            match = re.search(pattern, text)
            if match:
                price_str = re.sub(r'[^d.]', '', match.group())
                return float(price_str)
        return None
```

## Multi-Site Scraping Engine

Each e-commerce site has different loading behavior, anti-bot measures, and page structures. The scraping engine uses Playwright for JavaScript-heavy sites and falls back to HTTP requests for static pages.

```python
from playwright.async_api import async_playwright
import httpx

class PriceScraper:
    def __init__(self, extractor: AIPriceExtractor):
        self.extractor = extractor
        self.http_client = httpx.AsyncClient(
            timeout=30,
            headers={"User-Agent": (
                "Mozilla/5.0 (compatible; PriceMonitor/1.0)"
            )},
        )

    async def scrape_product(self, product: Product) -> PriceSnapshot:
        """Scrape price for a single product."""
        # Try simple HTTP first (faster, cheaper)
        try:
            page_text = await self._fetch_http(product.url)
            if self._looks_like_price_page(page_text):
                return await self._extract(product, page_text)
        except Exception:
            pass

        # Fall back to browser for JS-rendered pages
        page_text = await self._fetch_browser(product.url)
        return await self._extract(product, page_text)

    async def _fetch_http(self, url: str) -> str:
        resp = await self.http_client.get(url)
        resp.raise_for_status()
        return resp.text

    async def _fetch_browser(self, url: str) -> str:
        async with async_playwright() as p:
            browser = await p.chromium.launch(headless=True)
            page = await browser.new_page()
            await page.goto(url, wait_until="networkidle")
            text = await page.inner_text("body")
            await browser.close()
            return text

    async def _extract(self, product: Product,
                        page_text: str) -> PriceSnapshot:
        result = await self.extractor.extract_price(
            page_text, product.name
        )
        return PriceSnapshot(
            product_id=product.id,
            price=result["price"],
            currency=result.get("currency", product.currency),
            timestamp=datetime.utcnow(),
            raw_text=result.get("original_text", ""),
        )

    def _looks_like_price_page(self, html: str) -> bool:
        """Quick check if HTTP response has price-like content."""
        return bool(re.search(r'[$€£]s*d', html))
```

## Change Detection and Alerting

The change detection layer compares each new price snapshot against the previous one and evaluates alert rules to determine if a notification should be sent.

```python
class ChangeDetector:
    def __init__(self, db: PriceDatabase):
        self.db = db

    def check_alerts(self, product: Product,
                     new_price: float,
                     rules: list[AlertRule]) -> list[dict]:
        """Evaluate alert rules against price change."""
        previous = product.current_price
        if previous is None:
            return []

        alerts = []
        for rule in rules:
            triggered = False

            if rule.condition == "any_change" and new_price != previous:
                triggered = True
            elif rule.condition == "drop_below":
                triggered = new_price = rule.threshold

            if triggered:
                alerts.append({
                    "product": product.name,
                    "old_price": previous,
                    "new_price": new_price,
                    "rule": rule.condition,
                    "channels": rule.notify_channels,
                })

        return alerts
```

## Running the Monitor on a Schedule

Tie everything together with an async scheduler that runs price checks at configurable intervals.

```python
import asyncio

async def run_price_monitor(products: list[Product],
                            rules: list[AlertRule],
                            interval_minutes: int = 60):
    """Main monitoring loop."""
    db = PriceDatabase()
    extractor = AIPriceExtractor(AsyncOpenAI())
    scraper = PriceScraper(extractor)
    detector = ChangeDetector(db)

    while True:
        for product in products:
            try:
                snapshot = await scraper.scrape_product(product)
                alerts = detector.check_alerts(
                    product, snapshot.price, rules
                )
                db.record_price(snapshot)

                for alert in alerts:
                    await send_notification(alert)

            except Exception as e:
                print(f"Error checking {product.name}: {e}")

        await asyncio.sleep(interval_minutes * 60)
```

## FAQ

### How do I avoid getting blocked by e-commerce sites?

Respect robots.txt directives, use reasonable request intervals (at least 30-60 seconds between requests to the same domain), rotate user agents, and consider using the site's official API or affiliate feeds when available. For production use, services like ScrapingBee or Browserless can handle anti-bot measures.

### How accurate is LLM-based price extraction compared to CSS selectors?

LLM extraction is more robust across different sites but slightly less precise on well-structured pages. The best approach is a hybrid: maintain CSS selectors for your highest-volume sites and use LLM extraction as a fallback and for new sites. Test extraction accuracy regularly against a ground truth dataset.

### How should I store historical price data at scale?

For small to medium volumes (thousands of products), SQLite or PostgreSQL with a time-indexed snapshots table works well. For larger volumes, consider a time-series database like TimescaleDB, which is PostgreSQL-compatible but optimized for time-series queries and data retention policies.

---

#PriceMonitoring #WebScraping #ECommerce #AIAgents #DataExtraction #PriceTracking #Playwright #Automation

---

Source: https://callsphere.ai/blog/building-price-monitoring-agent-ecommerce-tracking
