Web Scraping with Apify + Python: Complete Guide

Apify's no-code actors are powerful on their own — but combining them with Python unlocks a much deeper level of automation. You get Apify's reliable infrastructure, anti-bot handling, and 3,000+ ready-made actors, combined with Python's full flexibility for processing, transforming, scheduling, and routing data wherever it needs to go.

This guide shows you exactly how to control Apify from Python, with practical code examples.

🕷️

New to Apify?

Start with a free account and $5 in credits — no credit card required.

Get Started Free →

Installing the Apify Python Client

pip install apify-client

You'll also need your Apify API token, which you can find in your account settings under Integrations → API token.

Running an Actor via Python

This example runs the Google Maps Scraper actor and returns results as a Python list:

from apify_client import ApifyClient

# Initialize with your API token
client = ApifyClient("YOUR_APIFY_API_TOKEN")

# Define actor input
run_input = {
    "searchStringsArray": ["restaurants in New York"],
    "maxCrawledPlaces": 100,
    "language": "en",
}

# Run the actor and wait for it to finish
run = client.actor("compass/crawler-google-places").call(run_input=run_input)

# Fetch results from the dataset
results = []
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    results.append(item)

print(f"Scraped {len(results)} places")
print(results[0])  # Inspect first result

Saving Results to CSV

import pandas as pd

df = pd.DataFrame(results)

# Keep only relevant columns
columns = ["title", "address", "phone", "website", "totalScore", "reviewsCount"]
df = df[[col for col in columns if col in df.columns]]

df.to_csv("google_maps_results.csv", index=False)
print(f"Saved {len(df)} rows to google_maps_results.csv")

Saving to PostgreSQL

import psycopg2
from psycopg2.extras import execute_values

conn = psycopg2.connect("postgresql://user:password@localhost/mydb")
cur = conn.cursor()

rows = [(r.get("title"), r.get("address"), r.get("phone"), r.get("website")) for r in results]

execute_values(cur, """
    INSERT INTO leads (name, address, phone, website)
    VALUES %s
    ON CONFLICT (phone) DO NOTHING
""", rows)

conn.commit()
print(f"Inserted {len(rows)} rows into PostgreSQL")

Scheduling Runs Automatically

For clients who need fresh data on a recurring schedule, I use Python's schedule library or a cron job:

import schedule
import time
from apify_client import ApifyClient

def run_scraper():
    client = ApifyClient("YOUR_APIFY_API_TOKEN")
    run = client.actor("compass/crawler-google-places").call(
        run_input={
            "searchStringsArray": ["lawyers in Chicago"],
            "maxCrawledPlaces": 100,
        }
    )
    print(f"Run complete — dataset: {run['defaultDatasetId']}")

# Run every Monday at 7am
schedule.every().monday.at("07:00").do(run_scraper)

while True:
    schedule.run_pending()
    time.sleep(60)

Alternatively, use Apify's built-in scheduler (no extra code needed) for simple cron-based scheduling from the dashboard.

Using Webhooks for Event-Driven Pipelines

Instead of polling for results, Apify can POST a webhook to your server when a run finishes:

from flask import Flask, request

app = Flask(__name__)

@app.route("/webhook/apify", methods=["POST"])
def apify_webhook():
    data = request.get_json()
    dataset_id = data["resource"]["defaultDatasetId"]
    print(f"Run finished. Dataset: {dataset_id}")
    # Trigger your processing pipeline here
    return "", 200

Set the webhook URL in your actor's configuration. This is ideal for production pipelines where you don't want a Python process running 24/7 just to poll for completion.

When to Use Apify vs. a Custom Scraper

After 155+ scraping projects, here's my decision framework:

Use Apify when: An actor already exists for your target site, you need results fast, the client needs a self-service solution they can run themselves, or you're prototyping and want to validate the data before investing in custom code
Build a custom scraper when: No suitable actor exists, you need very specific data transformations or site-specific logic, or you're scraping at extreme scale (millions of pages/day) where per-run costs matter
Hybrid approach (my default): Use Apify actors for data collection — they handle anti-bot and infrastructure. Use Python for processing, enrichment, storage, and downstream integrations. This gives the best of both worlds

Tips from 155+ Projects

Always check the Apify Store before writing a scraper — saves hours of work
Use client.dataset(id).get_items_as_dict_stream() for very large datasets to avoid loading everything into memory
Store actor run IDs and timestamps in a database so you can replay or debug later
Combine multiple actors in a sequence: Maps → Contact Scraper → email verifier → CSV export
Use Apify's proxy rotation built into actors — don't worry about managing your own proxy pool

If you need a custom Apify + Python pipeline — scheduled scraping, database integration, or full data workflows — I build these for clients starting at $20/hr.

⚙️

Need a custom pipeline?

I build Apify + Python integrations — from simple one-off scripts to production data pipelines.

Discuss Your Project →

Need help implementing this?

I build custom automation, scraping pipelines, and AI solutions for businesses. 155+ projects delivered with a perfect 5.0 rating.

View Pricing →

Get the Free Web Scraping Toolkit

Join the newsletter and get my curated list of scraping tools, proxy comparison cheatsheet, and Python automation templates.