A client in the retail analytics space needed real-time price intelligence across 20 major e-commerce platforms. The challenge: dynamic JS-rendered pages, aggressive anti-bot systems, and a need to process over 500,000 product listings every day.

What I Built

I designed a distributed scraping pipeline using Scrapy for the core framework and Playwright for JavaScript-heavy pages. Celery workers manage job queuing, and a PostgreSQL database stores the full price history with indexed lookups for instant comparison queries.

Key Features

  • Automatic change detection โ€” only stores records when price or availability changes, keeping the DB lean
  • Proxy rotation with residential IPs to avoid rate limits and blocks
  • Scheduled daily runs via Celery Beat with retry logic on failure
  • REST API layer for client dashboard to query price trends over any time range

Results

The system has been running in production for 8+ months with 99.7% uptime, processing roughly 520,000 listings per 24-hour cycle. The client now uses it as the backbone of their competitive pricing strategy.