PageSpeed Insights Batch Analysis Tool

A command-line tool that automates Google PageSpeed Insights analysis across multiple URLs, extracting performance metrics (lab + field data) into structured CSV, JSON, and HTML reports.

Installation

Run instantly with `uvx` (recommended, no install needed)

uvx pagespeed quick-check https://example.com

Install with `pip` or `pipx`

pip install pagespeed
pagespeed quick-check https://example.com

Run from URL (just needs `uv`)

uv run https://raw.githubusercontent.com/volkanunsal/pagespeed/main/pagespeed_insights_tool.py quick-check https://example.com

Development

git clone https://github.com/volkanunsal/pagespeed.git
cd pagespeed
uv run pagespeed_insights_tool.py quick-check https://example.com

Prerequisites

Python 3.13+
Google API key (optional) — without one, you're limited to ~25 queries/day; with one, ~25,000/day

Getting an API Key

Go to the Google Cloud Console
Create a new project (or select an existing one)
Navigate to APIs & Services > Library
Search for PageSpeed Insights API and enable it
Go to APIs & Services > Credentials
Click Create Credentials > API Key
Copy the key and set it:
```
export PAGESPEED_API_KEY=your_key_here
```
Or add it to your pagespeed.toml config file (see Configuration).

Usage

`quick-check` — Fast single-URL spot check

Prints a formatted report to the terminal. No files written.

# Mobile only (default)
pagespeed quick-check https://www.google.com

# Both mobile and desktop
pagespeed quick-check https://www.google.com --device both

# With specific categories
pagespeed quick-check https://www.google.com --categories performance accessibility

Sample output:

============================================================
  URL:      https://www.google.com
  Strategy: mobile
============================================================
  Performance Score: 92/100 (GOOD)

  --- Lab Data ---
  First Contentful Paint............. 1200ms
  Largest Contentful Paint........... 1800ms
  Cumulative Layout Shift............ 0.0100
  Speed Index........................ 1500ms
  Total Blocking Time................ 150ms
  Time to Interactive................ 2100ms

`audit` — Full batch analysis

Analyzes multiple URLs and writes CSV/JSON reports.

# From a file of URLs
pagespeed audit -f urls.txt

# Multiple strategies and output formats
pagespeed audit -f urls.txt --device both --output-format both

# Inline URLs with custom output path
pagespeed audit https://a.com https://b.com -o report

# With a named profile
pagespeed audit -f urls.txt --profile full

# Piped input
cat urls.txt | pagespeed audit

# Include full Lighthouse audit data in JSON output
pagespeed audit -f urls.txt --full --output-format json

# Stream results as NDJSON to stdout as they complete
pagespeed audit -f urls.txt --stream

# Pipe streamed results into jq for real-time filtering
pagespeed audit -f urls.txt --stream | jq '.performance_score'

# Stream and filter to only failing URLs
pagespeed audit -f urls.txt --stream | jq 'select(.performance_score < 50)'

`--full` flag

Pass --full to embed the complete raw lighthouseResult object from the PageSpeed API into each result in the JSON output. This includes all Lighthouse audits, opportunities, diagnostics, and metadata — useful for deep analysis or feeding into other tools.

JSON: each result gains a top-level lighthouseResult key containing the full API object.
CSV: --full is silently ignored; the raw object is never written to CSV.
File naming: auto-named files get a -full suffix (e.g., 20260219T143022Z-mobile-full.json).

`--stream` flag

Pass --stream to print results to stdout as NDJSON (one JSON object per line) as each URL/strategy completes, instead of buffering everything and writing files at the end. This lets you pipe results into jq, grep, or other tools without waiting for the full batch to finish.

Output: one json.dumps line per result written to stdout immediately on completion.
File output: skipped — no CSV/JSON files are written in stream mode.
Summary: the post-run audit summary table is suppressed (not useful when piping).
Progress bar: still shown on stderr so you can track progress while piping stdout.
Budget: still evaluated if --budget is set, using the complete result set.

# Stream all results to stdout
pagespeed audit -f urls.txt --stream

# Extract a single field from each result
pagespeed audit -f urls.txt --stream | jq '.performance_score'

# Filter to only URLs below a score threshold
pagespeed audit -f urls.txt --stream | jq 'select(.performance_score < 50)'

# Save streamed results to a file while also viewing them
pagespeed audit -f urls.txt --stream | tee results.ndjson | jq '.url'

Each NDJSON line is a flat JSON object with the same fields as a CSV row (url, strategy, performance_score, lab_fcp_ms, etc.). null is used where a value is not available.

The URL file is one URL per line. Lines starting with # are comments:

# Main pages
https://example.com
https://example.com/about
https://example.com/contact

`compare` — Compare two reports

Loads two previous report files and shows per-URL score changes.

# Compare before and after
pagespeed compare before.csv after.csv

# Custom threshold (flag changes >= 10%)
pagespeed compare --threshold 10 old.json new.json

Output flags regressions with !! and improvements with ++.

`report` — Generate HTML dashboard

Creates a self-contained HTML report from a results file.

# Generate HTML from CSV results
pagespeed report results.csv

# Custom output path
pagespeed report results.json -o dashboard.html

# Auto-open in browser
pagespeed report results.csv --open

The HTML report includes:

Summary cards (total URLs, average/median/best/worst scores)
Color-coded score table (green/orange/red)
Core Web Vitals pass/fail indicators
Bar charts comparing scores across URLs
Field data table (when available)
Sortable columns (click headers)

`run` — Low-level direct access

Full control with every CLI flag. Same internals as audit.

pagespeed run https://example.com --device desktop --categories performance accessibility --delay 2.0

`pipeline` — End-to-end analysis

Resolves URLs from a sitemap (or file/inline), runs the analysis, writes CSV/JSON data files, and generates an HTML report — all in one command. Optionally evaluates a performance budget.

# From a sitemap (auto-detected from URL shape)
pagespeed pipeline https://example.com/sitemap.xml

# Limit URLs and auto-open report in browser
pagespeed pipeline https://example.com/sitemap.xml --sitemap-limit 20 --open

# Filter to a section of the sitemap, both devices
pagespeed pipeline https://example.com/sitemap.xml --sitemap-filter "/blog/" --device both

# Inline URLs
pagespeed pipeline https://a.com https://b.com --device both

# From a URL file
pagespeed pipeline -f urls.txt --open

# Data files only — skip HTML report generation
pagespeed pipeline -f urls.txt --no-report --output-format json

# Evaluate Core Web Vitals budget (exits 2 on failure)
pagespeed pipeline https://example.com/sitemap.xml --budget cwv

# Custom budget with GitHub Actions output format
pagespeed pipeline https://example.com/sitemap.xml --budget budget.toml --budget-format github

Sitemap auto-detection: when a single positional argument looks like a sitemap (ends in .xml, contains sitemap in the path, or the file content starts with <?xml), it is treated as a sitemap source automatically. Pass --sitemap explicitly to use a sitemap alongside inline URLs.

`pipeline` flags

Flag	Short	Default	Description
`source`	—	`[]`	Sitemap URL/path (auto-detected) or plain URLs
`--file`	`-f`	None	File with one URL per line
`--sitemap`	—	None	Explicit sitemap URL or local path
`--sitemap-limit`	—	None	Max URLs to extract from sitemap
`--sitemap-filter`	—	None	Regex to filter sitemap URLs
`--open`	—	`False`	Auto-open HTML report in browser after completion
`--no-report`	—	`False`	Skip HTML report; write data files only
`--budget`	—	None	Budget file (TOML) or `cwv` preset — exits 2 on failure
`--budget-format`	—	`text`	Budget output format: `text`, `json`, or `github`
`--webhook`	—	None	Webhook URL for budget result notifications
`--webhook-on`	—	`always`	When to send webhook: `always` or `fail`

All audit flags (--device, --output-format, --output, --output-dir, --delay, --workers, --categories) also apply.

Configuration

Config file: `pagespeed.toml`

An optional TOML file for persistent settings and named profiles. The tool searches for it in:

Current working directory (./pagespeed.toml)
User config directory (~/.config/pagespeed/config.toml)

You can also pass an explicit path with --config path/to/config.toml.

[settings]
api_key = "YOUR_API_KEY"       # or use PAGESPEED_API_KEY env var
urls_file = "urls.txt"         # default URL file for -f
delay = 1.5                    # seconds between API requests
device = "mobile"              # mobile, desktop, or both
output_format = "csv"          # csv, json, or both
output_dir = "./reports"       # directory for output files
workers = 4                    # concurrent workers (1 = sequential)
categories = ["performance"]   # Lighthouse categories
verbose = false

[profiles.quick]
device = "mobile"
output_format = "csv"
categories = ["performance"]

[profiles.full]
device = "both"
output_format = "both"
categories = ["performance", "accessibility", "best-practices", "seo"]

[profiles.core-vitals]
device = "both"
output_format = "csv"
categories = ["performance"]

[profiles.client-report]
urls_file = "client_urls.txt"
device = "both"
output_format = "both"
output_dir = "./client-reports"
categories = ["performance", "accessibility", "seo"]

Config resolution order

Settings are merged with the following priority (highest wins):

CLI flags — explicit command-line arguments
Profile values — via --profile name
[settings] — defaults from config file
Built-in defaults — hardcoded in the script

Global flags

Flag	Short	Default	Description
`--api-key`	—	config/env	Google API key
`--config`	`-c`	auto-discovered	Path to config TOML
`--profile`	`-p`	None	Named profile from config
`--verbose`	`-v`	False	Verbose output to stderr
`--version`	—	—	Print version and exit

`audit` / `run` flags

Flag	Short	Default	Description
`urls`	—	`[]`	Positional URLs
`--file`	`-f`	None	File with one URL per line
`--device`	—	`mobile`	`mobile`, `desktop`, or `both`
`--output-format`	—	`csv`	`csv`, `json`, or `both`
`--output`	`-o`	auto-timestamped	Explicit output file path
`--output-dir`	—	`./reports/`	Directory for auto-named files
`--delay`	`-d`	`1.5`	Seconds between requests
`--workers`	`-w`	`4`	Concurrent workers
`--categories`	—	`performance`	Lighthouse categories
`--full`	—	`False`	Embed raw `lighthouseResult` in JSON output (ignored for CSV)
`--stream`	—	`False`	Print results as NDJSON to stdout as they complete (skips file output)

Output Formats

File naming

By default, output files use UTC timestamps:

{output_dir}/{YYYYMMDD}T{HHMMSS}Z-{strategy}.{ext}

Examples:

./reports/20260216T143022Z-mobile.csv
./reports/20260216T150000Z-both.json
./reports/20260216T143022Z-report.html

Use -o to override with an explicit path.

CSV

Flat table with one row per (URL, strategy) pair. Columns:

Column	Description
`url`	The analyzed URL
`strategy`	`mobile` or `desktop`
`performance_score`	0-100 Lighthouse score
`lab_fcp_ms`	First Contentful Paint (ms)
`lab_lcp_ms`	Largest Contentful Paint (ms)
`lab_cls`	Cumulative Layout Shift
`lab_speed_index_ms`	Speed Index (ms)
`lab_tbt_ms`	Total Blocking Time (ms)
`lab_tti_ms`	Time to Interactive (ms)
`field_*`	Field (CrUX) metrics (when available)
`error`	Error message if the request failed

JSON

Structured with metadata header:

{
  "metadata": {
    "generated_at": "2026-02-16T14:30:22+00:00",
    "total_urls": 5,
    "strategies": ["mobile", "desktop"],
    "tool_version": "2.1.0"
  },
  "results": [
    {
      "url": "https://example.com",
      "strategy": "mobile",
      "performance_score": 92,
      "lab_metrics": { "lab_fcp_ms": 1200, "lab_lcp_ms": 1800, ... },
      "field_metrics": { "field_lcp_ms": 2100, "field_lcp_category": "FAST", ... },
      "error": null
    }
  ]
}

With --full, each result also includes the complete raw lighthouseResult from the API:

{
  "results": [
    {
      "url": "https://example.com",
      "strategy": "mobile",
      "performance_score": 92,
      "lab_metrics": { ... },
      "field_metrics": { ... },
      "lighthouseResult": {
        "audits": { ... },
        "categories": { ... },
        "categoryGroups": { ... },
        "configSettings": { ... },
        "environment": { ... },
        "fetchTime": "...",
        "finalUrl": "https://example.com",
        "lighthouseVersion": "...",
        "requestedUrl": "https://example.com",
        "runWarnings": [],
        "stackPacks": [],
        "timing": { ... },
        "i18n": { ... }
      },
      "error": null
    }
  ]
}

Metrics Reference

Lab data (synthetic, from Lighthouse)

Metric	Good	Needs Work	Poor
First Contentful Paint	< 1.8s	1.8s–3.0s	> 3.0s
Largest Contentful Paint	< 2.5s	2.5s–4.0s	> 4.0s
Cumulative Layout Shift	< 0.1	0.1–0.25	> 0.25
Total Blocking Time	< 200ms	200ms–600ms	> 600ms
Speed Index	< 3.4s	3.4s–5.8s	> 5.8s
Time to Interactive	< 3.8s	3.8s–7.3s	> 7.3s

Field data (real users, from CrUX)

Field data comes from the Chrome User Experience Report. It may not be available for low-traffic sites.

Metric	Description
FCP	First Contentful Paint — when first content appears
LCP	Largest Contentful Paint — when main content loads
CLS	Cumulative Layout Shift — visual stability
INP	Interaction to Next Paint — input responsiveness
FID	First Input Delay — (deprecated, replaced by INP)
TTFB	Time to First Byte — server response time

Rate Limits

Scenario	Limit
Without API key	~25 queries/100 seconds
With API key	~25,000 queries/day (400/100 seconds)

Tips:

Use --delay to increase time between requests if hitting rate limits
The tool retries on 429 (rate limit) responses with exponential backoff
See Concurrency Model for how --workers and --delay interact

Concurrency Model

The tool uses asyncio + httpx for non-blocking HTTP I/O.

How it works:

With --workers 1 (or effectively 1), requests run strictly sequentially — one finishes before the next starts.
With --workers N > 1 (default: 4), all tasks are launched together via asyncio.gather(). A shared asyncio.Semaphore(1) ensures requests start no more than once per --delay seconds:
1. Each coroutine acquires the semaphore
2. Sleeps the remainder of delay since the last request started
3. Records the timestamp and releases the semaphore
4. Makes the actual HTTP request — outside the semaphore

Because the HTTP call happens after releasing the semaphore, multiple requests can be in-flight simultaneously even though they start delay seconds apart. Wall time is therefore much shorter than n_urls × (delay + latency); it converges toward n_urls × delay + avg_latency as the number of URLs grows.

Practical rule of thumb:

Goal	Setting
Safest for rate limits	`--workers 1` (sequential)
Default (balanced)	`--workers 4 --delay 1.5`
Maximum throughput	`--workers 4 --delay 1.0` (watch for 429s)

Cron usage

Output files auto-increment with timestamps, so cron jobs won't overwrite previous results:

# Every Monday at 6am UTC
0 6 * * 1 cd /path/to/project && pagespeed audit -f urls.txt --profile full

Examples

The examples/ folder contains ready-to-use configuration files for common workflows:

Example	Description
`basic/`	Minimal config with API key, strategy, and a sample URL list
`multi-profile/`	Named profiles for quick, full, and client-report workflows
`ci-budget/`	Strict and lenient performance budgets for CI pipelines
`sitemap-pipeline/`	Sitemap auto-discovery with regex filters and section-specific profiles

Copy any example folder into your project and edit to taste. See examples/README.md for full details.

Testing

The project includes a comprehensive test suite (168 tests across 30 test classes). All tests run offline — API calls, sitemap fetches, and file I/O are mocked.

# Run all tests
uv run pytest test_pagespeed_insights_tool.py -v

# Run a single test class
uv run pytest test_pagespeed_insights_tool.py -v -k TestValidateUrl

# Run a specific test method
uv run pytest test_pagespeed_insights_tool.py -v -k test_full_extraction

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
.claude		.claude
.github/workflows		.github/workflows
docs		docs
examples		examples
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pagespeed_insights_tool.py		pagespeed_insights_tool.py
pyproject.toml		pyproject.toml
test_pagespeed_insights_tool.py		test_pagespeed_insights_tool.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

PageSpeed Insights Batch Analysis Tool

Installation

Run instantly with uvx (recommended, no install needed)

Install with pip or pipx

Run from URL (just needs uv)

Development

Prerequisites

Getting an API Key

Usage

quick-check — Fast single-URL spot check

audit — Full batch analysis

--full flag

--stream flag

compare — Compare two reports

report — Generate HTML dashboard

run — Low-level direct access

pipeline — End-to-end analysis

pipeline flags

Configuration

Config file: pagespeed.toml

Config resolution order

Global flags

audit / run flags

Output Formats

File naming

CSV

JSON

Metrics Reference

Lab data (synthetic, from Lighthouse)

Field data (real users, from CrUX)

Rate Limits

Concurrency Model

Cron usage

Examples

Testing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Run instantly with `uvx` (recommended, no install needed)

Install with `pip` or `pipx`

Run from URL (just needs `uv`)

`quick-check` — Fast single-URL spot check

`audit` — Full batch analysis

`--full` flag

`--stream` flag

`compare` — Compare two reports

`report` — Generate HTML dashboard

`run` — Low-level direct access

`pipeline` — End-to-end analysis

`pipeline` flags

Config file: `pagespeed.toml`

`audit` / `run` flags

Packages