Apify's no-code actors are powerful on their own โ but combining them with Python unlocks a much deeper level of automation. You get Apify's reliable infrastructure, anti-bot handling, and 3,000+ ready-made actors, combined with Python's full flexibility for processing, transforming, scheduling, and routing data wherever it needs to go.
This guide shows you exactly how to control Apify from Python, with practical code examples.
Start with a free account and $5 in credits โ no credit card required.
Installing the Apify Python Client
pip install apify-client
You'll also need your Apify API token, which you can find in your account settings under Integrations โ API token.
Running an Actor via Python
This example runs the Google Maps Scraper actor and returns results as a Python list:
from apify_client import ApifyClient
# Initialize with your API token
client = ApifyClient("YOUR_APIFY_API_TOKEN")
# Define actor input
run_input = {
"searchStringsArray": ["restaurants in New York"],
"maxCrawledPlaces": 100,
"language": "en",
}
# Run the actor and wait for it to finish
run = client.actor("compass/crawler-google-places").call(run_input=run_input)
# Fetch results from the dataset
results = []
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
results.append(item)
print(f"Scraped {len(results)} places")
print(results[0]) # Inspect first result
Saving Results to CSV
import pandas as pd
df = pd.DataFrame(results)
# Keep only relevant columns
columns = ["title", "address", "phone", "website", "totalScore", "reviewsCount"]
df = df[[col for col in columns if col in df.columns]]
df.to_csv("google_maps_results.csv", index=False)
print(f"Saved {len(df)} rows to google_maps_results.csv")
Saving to PostgreSQL
import psycopg2
from psycopg2.extras import execute_values
conn = psycopg2.connect("postgresql://user:password@localhost/mydb")
cur = conn.cursor()
rows = [(r.get("title"), r.get("address"), r.get("phone"), r.get("website")) for r in results]
execute_values(cur, """
INSERT INTO leads (name, address, phone, website)
VALUES %s
ON CONFLICT (phone) DO NOTHING
""", rows)
conn.commit()
print(f"Inserted {len(rows)} rows into PostgreSQL")
Scheduling Runs Automatically
For clients who need fresh data on a recurring schedule, I use Python's schedule library or a cron job:
import schedule
import time
from apify_client import ApifyClient
def run_scraper():
client = ApifyClient("YOUR_APIFY_API_TOKEN")
run = client.actor("compass/crawler-google-places").call(
run_input={
"searchStringsArray": ["lawyers in Chicago"],
"maxCrawledPlaces": 100,
}
)
print(f"Run complete โ dataset: {run['defaultDatasetId']}")
# Run every Monday at 7am
schedule.every().monday.at("07:00").do(run_scraper)
while True:
schedule.run_pending()
time.sleep(60)
Alternatively, use Apify's built-in scheduler (no extra code needed) for simple cron-based scheduling from the dashboard.
Using Webhooks for Event-Driven Pipelines
Instead of polling for results, Apify can POST a webhook to your server when a run finishes:
from flask import Flask, request
app = Flask(__name__)
@app.route("/webhook/apify", methods=["POST"])
def apify_webhook():
data = request.get_json()
dataset_id = data["resource"]["defaultDatasetId"]
print(f"Run finished. Dataset: {dataset_id}")
# Trigger your processing pipeline here
return "", 200
Set the webhook URL in your actor's configuration. This is ideal for production pipelines where you don't want a Python process running 24/7 just to poll for completion.
When to Use Apify vs. a Custom Scraper
After 155+ scraping projects, here's my decision framework:
- Use Apify when: An actor already exists for your target site, you need results fast, the client needs a self-service solution they can run themselves, or you're prototyping and want to validate the data before investing in custom code
- Build a custom scraper when: No suitable actor exists, you need very specific data transformations or site-specific logic, or you're scraping at extreme scale (millions of pages/day) where per-run costs matter
- Hybrid approach (my default): Use Apify actors for data collection โ they handle anti-bot and infrastructure. Use Python for processing, enrichment, storage, and downstream integrations. This gives the best of both worlds
Tips from 155+ Projects
- Always check the Apify Store before writing a scraper โ saves hours of work
- Use
client.dataset(id).get_items_as_dict_stream()for very large datasets to avoid loading everything into memory - Store actor run IDs and timestamps in a database so you can replay or debug later
- Combine multiple actors in a sequence: Maps โ Contact Scraper โ email verifier โ CSV export
- Use Apify's proxy rotation built into actors โ don't worry about managing your own proxy pool
If you need a custom Apify + Python pipeline โ scheduled scraping, database integration, or full data workflows โ I build these for clients starting at $20/hr.
I build Apify + Python integrations โ from simple one-off scripts to production data pipelines.