Skip to content

volkanunsal/pagespeed

Repository files navigation

PageSpeed Insights Batch Analysis Tool

A command-line tool that automates Google PageSpeed Insights analysis across multiple URLs, extracting performance metrics (lab + field data) into structured CSV, JSON, and HTML reports.

Installation

Run instantly with uvx (recommended, no install needed)

uvx pagespeed quick-check https://example.com

Install with pip or pipx

pip install pagespeed
pagespeed quick-check https://example.com

Run from URL (just needs uv)

uv run https://raw.githubusercontent.com/volkanunsal/pagespeed/main/pagespeed_insights_tool.py quick-check https://example.com

Development

git clone https://github.com/volkanunsal/pagespeed.git
cd pagespeed
uv run pagespeed_insights_tool.py quick-check https://example.com

Prerequisites

  • Python 3.13+
  • Google API key (optional) — without one, you're limited to ~25 queries/day; with one, ~25,000/day

Getting an API Key

  1. Go to the Google Cloud Console
  2. Create a new project (or select an existing one)
  3. Navigate to APIs & Services > Library
  4. Search for PageSpeed Insights API and enable it
  5. Go to APIs & Services > Credentials
  6. Click Create Credentials > API Key
  7. Copy the key and set it:
    export PAGESPEED_API_KEY=your_key_here
    Or add it to your pagespeed.toml config file (see Configuration).

Usage

quick-check — Fast single-URL spot check

Prints a formatted report to the terminal. No files written.

# Mobile only (default)
pagespeed quick-check https://www.google.com

# Both mobile and desktop
pagespeed quick-check https://www.google.com --device both

# With specific categories
pagespeed quick-check https://www.google.com --categories performance accessibility

Sample output:

============================================================
  URL:      https://www.google.com
  Strategy: mobile
============================================================
  Performance Score: 92/100 (GOOD)

  --- Lab Data ---
  First Contentful Paint............. 1200ms
  Largest Contentful Paint........... 1800ms
  Cumulative Layout Shift............ 0.0100
  Speed Index........................ 1500ms
  Total Blocking Time................ 150ms
  Time to Interactive................ 2100ms

audit — Full batch analysis

Analyzes multiple URLs and writes CSV/JSON reports.

# From a file of URLs
pagespeed audit -f urls.txt

# Multiple strategies and output formats
pagespeed audit -f urls.txt --device both --output-format both

# Inline URLs with custom output path
pagespeed audit https://a.com https://b.com -o report

# With a named profile
pagespeed audit -f urls.txt --profile full

# Piped input
cat urls.txt | pagespeed audit

# Include full Lighthouse audit data in JSON output
pagespeed audit -f urls.txt --full --output-format json

# Stream results as NDJSON to stdout as they complete
pagespeed audit -f urls.txt --stream

# Pipe streamed results into jq for real-time filtering
pagespeed audit -f urls.txt --stream | jq '.performance_score'

# Stream and filter to only failing URLs
pagespeed audit -f urls.txt --stream | jq 'select(.performance_score < 50)'

--full flag

Pass --full to embed the complete raw lighthouseResult object from the PageSpeed API into each result in the JSON output. This includes all Lighthouse audits, opportunities, diagnostics, and metadata — useful for deep analysis or feeding into other tools.

  • JSON: each result gains a top-level lighthouseResult key containing the full API object.
  • CSV: --full is silently ignored; the raw object is never written to CSV.
  • File naming: auto-named files get a -full suffix (e.g., 20260219T143022Z-mobile-full.json).

--stream flag

Pass --stream to print results to stdout as NDJSON (one JSON object per line) as each URL/strategy completes, instead of buffering everything and writing files at the end. This lets you pipe results into jq, grep, or other tools without waiting for the full batch to finish.

  • Output: one json.dumps line per result written to stdout immediately on completion.
  • File output: skipped — no CSV/JSON files are written in stream mode.
  • Summary: the post-run audit summary table is suppressed (not useful when piping).
  • Progress bar: still shown on stderr so you can track progress while piping stdout.
  • Budget: still evaluated if --budget is set, using the complete result set.
# Stream all results to stdout
pagespeed audit -f urls.txt --stream

# Extract a single field from each result
pagespeed audit -f urls.txt --stream | jq '.performance_score'

# Filter to only URLs below a score threshold
pagespeed audit -f urls.txt --stream | jq 'select(.performance_score < 50)'

# Save streamed results to a file while also viewing them
pagespeed audit -f urls.txt --stream | tee results.ndjson | jq '.url'

Each NDJSON line is a flat JSON object with the same fields as a CSV row (url, strategy, performance_score, lab_fcp_ms, etc.). null is used where a value is not available.

The URL file is one URL per line. Lines starting with # are comments:

# Main pages
https://example.com
https://example.com/about
https://example.com/contact

compare — Compare two reports

Loads two previous report files and shows per-URL score changes.

# Compare before and after
pagespeed compare before.csv after.csv

# Custom threshold (flag changes >= 10%)
pagespeed compare --threshold 10 old.json new.json

Output flags regressions with !! and improvements with ++.

report — Generate HTML dashboard

Creates a self-contained HTML report from a results file.

# Generate HTML from CSV results
pagespeed report results.csv

# Custom output path
pagespeed report results.json -o dashboard.html

# Auto-open in browser
pagespeed report results.csv --open

The HTML report includes:

  • Summary cards (total URLs, average/median/best/worst scores)
  • Color-coded score table (green/orange/red)
  • Core Web Vitals pass/fail indicators
  • Bar charts comparing scores across URLs
  • Field data table (when available)
  • Sortable columns (click headers)

run — Low-level direct access

Full control with every CLI flag. Same internals as audit.

pagespeed run https://example.com --device desktop --categories performance accessibility --delay 2.0

pipeline — End-to-end analysis

Resolves URLs from a sitemap (or file/inline), runs the analysis, writes CSV/JSON data files, and generates an HTML report — all in one command. Optionally evaluates a performance budget.

# From a sitemap (auto-detected from URL shape)
pagespeed pipeline https://example.com/sitemap.xml

# Limit URLs and auto-open report in browser
pagespeed pipeline https://example.com/sitemap.xml --sitemap-limit 20 --open

# Filter to a section of the sitemap, both devices
pagespeed pipeline https://example.com/sitemap.xml --sitemap-filter "/blog/" --device both

# Inline URLs
pagespeed pipeline https://a.com https://b.com --device both

# From a URL file
pagespeed pipeline -f urls.txt --open

# Data files only — skip HTML report generation
pagespeed pipeline -f urls.txt --no-report --output-format json

# Evaluate Core Web Vitals budget (exits 2 on failure)
pagespeed pipeline https://example.com/sitemap.xml --budget cwv

# Custom budget with GitHub Actions output format
pagespeed pipeline https://example.com/sitemap.xml --budget budget.toml --budget-format github

Sitemap auto-detection: when a single positional argument looks like a sitemap (ends in .xml, contains sitemap in the path, or the file content starts with <?xml), it is treated as a sitemap source automatically. Pass --sitemap explicitly to use a sitemap alongside inline URLs.

pipeline flags

Flag Short Default Description
source [] Sitemap URL/path (auto-detected) or plain URLs
--file -f None File with one URL per line
--sitemap None Explicit sitemap URL or local path
--sitemap-limit None Max URLs to extract from sitemap
--sitemap-filter None Regex to filter sitemap URLs
--open False Auto-open HTML report in browser after completion
--no-report False Skip HTML report; write data files only
--budget None Budget file (TOML) or cwv preset — exits 2 on failure
--budget-format text Budget output format: text, json, or github
--webhook None Webhook URL for budget result notifications
--webhook-on always When to send webhook: always or fail

All audit flags (--device, --output-format, --output, --output-dir, --delay, --workers, --categories) also apply.

Configuration

Config file: pagespeed.toml

An optional TOML file for persistent settings and named profiles. The tool searches for it in:

  1. Current working directory (./pagespeed.toml)
  2. User config directory (~/.config/pagespeed/config.toml)

You can also pass an explicit path with --config path/to/config.toml.

[settings]
api_key = "YOUR_API_KEY"       # or use PAGESPEED_API_KEY env var
urls_file = "urls.txt"         # default URL file for -f
delay = 1.5                    # seconds between API requests
device = "mobile"              # mobile, desktop, or both
output_format = "csv"          # csv, json, or both
output_dir = "./reports"       # directory for output files
workers = 4                    # concurrent workers (1 = sequential)
categories = ["performance"]   # Lighthouse categories
verbose = false

[profiles.quick]
device = "mobile"
output_format = "csv"
categories = ["performance"]

[profiles.full]
device = "both"
output_format = "both"
categories = ["performance", "accessibility", "best-practices", "seo"]

[profiles.core-vitals]
device = "both"
output_format = "csv"
categories = ["performance"]

[profiles.client-report]
urls_file = "client_urls.txt"
device = "both"
output_format = "both"
output_dir = "./client-reports"
categories = ["performance", "accessibility", "seo"]

Config resolution order

Settings are merged with the following priority (highest wins):

  1. CLI flags — explicit command-line arguments
  2. Profile values — via --profile name
  3. [settings] — defaults from config file
  4. Built-in defaults — hardcoded in the script

Global flags

Flag Short Default Description
--api-key config/env Google API key
--config -c auto-discovered Path to config TOML
--profile -p None Named profile from config
--verbose -v False Verbose output to stderr
--version Print version and exit

audit / run flags

Flag Short Default Description
urls [] Positional URLs
--file -f None File with one URL per line
--device mobile mobile, desktop, or both
--output-format csv csv, json, or both
--output -o auto-timestamped Explicit output file path
--output-dir ./reports/ Directory for auto-named files
--delay -d 1.5 Seconds between requests
--workers -w 4 Concurrent workers
--categories performance Lighthouse categories
--full False Embed raw lighthouseResult in JSON output (ignored for CSV)
--stream False Print results as NDJSON to stdout as they complete (skips file output)

Output Formats

File naming

By default, output files use UTC timestamps:

{output_dir}/{YYYYMMDD}T{HHMMSS}Z-{strategy}.{ext}

Examples:

./reports/20260216T143022Z-mobile.csv
./reports/20260216T150000Z-both.json
./reports/20260216T143022Z-report.html

Use -o to override with an explicit path.

CSV

Flat table with one row per (URL, strategy) pair. Columns:

Column Description
url The analyzed URL
strategy mobile or desktop
performance_score 0-100 Lighthouse score
lab_fcp_ms First Contentful Paint (ms)
lab_lcp_ms Largest Contentful Paint (ms)
lab_cls Cumulative Layout Shift
lab_speed_index_ms Speed Index (ms)
lab_tbt_ms Total Blocking Time (ms)
lab_tti_ms Time to Interactive (ms)
field_* Field (CrUX) metrics (when available)
error Error message if the request failed

JSON

Structured with metadata header:

{
  "metadata": {
    "generated_at": "2026-02-16T14:30:22+00:00",
    "total_urls": 5,
    "strategies": ["mobile", "desktop"],
    "tool_version": "2.1.0"
  },
  "results": [
    {
      "url": "https://example.com",
      "strategy": "mobile",
      "performance_score": 92,
      "lab_metrics": { "lab_fcp_ms": 1200, "lab_lcp_ms": 1800, ... },
      "field_metrics": { "field_lcp_ms": 2100, "field_lcp_category": "FAST", ... },
      "error": null
    }
  ]
}

With --full, each result also includes the complete raw lighthouseResult from the API:

{
  "results": [
    {
      "url": "https://example.com",
      "strategy": "mobile",
      "performance_score": 92,
      "lab_metrics": { ... },
      "field_metrics": { ... },
      "lighthouseResult": {
        "audits": { ... },
        "categories": { ... },
        "categoryGroups": { ... },
        "configSettings": { ... },
        "environment": { ... },
        "fetchTime": "...",
        "finalUrl": "https://example.com",
        "lighthouseVersion": "...",
        "requestedUrl": "https://example.com",
        "runWarnings": [],
        "stackPacks": [],
        "timing": { ... },
        "i18n": { ... }
      },
      "error": null
    }
  ]
}

Metrics Reference

Lab data (synthetic, from Lighthouse)

Metric Good Needs Work Poor
First Contentful Paint < 1.8s 1.8s–3.0s > 3.0s
Largest Contentful Paint < 2.5s 2.5s–4.0s > 4.0s
Cumulative Layout Shift < 0.1 0.1–0.25 > 0.25
Total Blocking Time < 200ms 200ms–600ms > 600ms
Speed Index < 3.4s 3.4s–5.8s > 5.8s
Time to Interactive < 3.8s 3.8s–7.3s > 7.3s

Field data (real users, from CrUX)

Field data comes from the Chrome User Experience Report. It may not be available for low-traffic sites.

Metric Description
FCP First Contentful Paint — when first content appears
LCP Largest Contentful Paint — when main content loads
CLS Cumulative Layout Shift — visual stability
INP Interaction to Next Paint — input responsiveness
FID First Input Delay — (deprecated, replaced by INP)
TTFB Time to First Byte — server response time

Rate Limits

Scenario Limit
Without API key ~25 queries/100 seconds
With API key ~25,000 queries/day (400/100 seconds)

Tips:

  • Use --delay to increase time between requests if hitting rate limits
  • The tool retries on 429 (rate limit) responses with exponential backoff
  • See Concurrency Model for how --workers and --delay interact

Concurrency Model

The tool uses asyncio + httpx for non-blocking HTTP I/O.

How it works:

  • With --workers 1 (or effectively 1), requests run strictly sequentially — one finishes before the next starts.
  • With --workers N > 1 (default: 4), all tasks are launched together via asyncio.gather(). A shared asyncio.Semaphore(1) ensures requests start no more than once per --delay seconds:
    1. Each coroutine acquires the semaphore
    2. Sleeps the remainder of delay since the last request started
    3. Records the timestamp and releases the semaphore
    4. Makes the actual HTTP request — outside the semaphore

Because the HTTP call happens after releasing the semaphore, multiple requests can be in-flight simultaneously even though they start delay seconds apart. Wall time is therefore much shorter than n_urls × (delay + latency); it converges toward n_urls × delay + avg_latency as the number of URLs grows.

Practical rule of thumb:

Goal Setting
Safest for rate limits --workers 1 (sequential)
Default (balanced) --workers 4 --delay 1.5
Maximum throughput --workers 4 --delay 1.0 (watch for 429s)

Cron usage

Output files auto-increment with timestamps, so cron jobs won't overwrite previous results:

# Every Monday at 6am UTC
0 6 * * 1 cd /path/to/project && pagespeed audit -f urls.txt --profile full

Examples

The examples/ folder contains ready-to-use configuration files for common workflows:

Example Description
basic/ Minimal config with API key, strategy, and a sample URL list
multi-profile/ Named profiles for quick, full, and client-report workflows
ci-budget/ Strict and lenient performance budgets for CI pipelines
sitemap-pipeline/ Sitemap auto-discovery with regex filters and section-specific profiles

Copy any example folder into your project and edit to taste. See examples/README.md for full details.

Testing

The project includes a comprehensive test suite (168 tests across 30 test classes). All tests run offline — API calls, sitemap fetches, and file I/O are mocked.

# Run all tests
uv run pytest test_pagespeed_insights_tool.py -v

# Run a single test class
uv run pytest test_pagespeed_insights_tool.py -v -k TestValidateUrl

# Run a specific test method
uv run pytest test_pagespeed_insights_tool.py -v -k test_full_extraction

License

This project is licensed under the MIT License.

About

CLI tool for batch-querying Google PageSpeed Insights API v5 with CSV, JSON, and HTML report output

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors