Large-Scale E-Commerce Price Tracker

A client in the retail analytics space needed real-time price intelligence across 20 major e-commerce platforms. The challenge: dynamic JS-rendered pages, aggressive anti-bot systems, and a need to process over 500,000 product listings every day.

What I Built

I designed a distributed scraping pipeline using Scrapy for the core framework and Playwright for JavaScript-heavy pages. Celery workers manage job queuing, and a PostgreSQL database stores the full price history with indexed lookups for instant comparison queries.

Key Features

Automatic change detection — only stores records when price or availability changes, keeping the DB lean
Proxy rotation with residential IPs to avoid rate limits and blocks
Scheduled daily runs via Celery Beat with retry logic on failure
REST API layer for client dashboard to query price trends over any time range

Results

The system has been running in production for 8+ months with 99.7% uptime, processing roughly 520,000 listings per 24-hour cycle. The client now uses it as the backbone of their competitive pricing strategy.