Checking PageSpeed Insights for multiple URLs is manual, slow, and doesn't produce structured data for tracking over time. Teams need a way to batch-analyze URLs, compare results across runs, and share visual reports.
- Automate PageSpeed Insights analysis across multiple URLs
- Extract both lab (Lighthouse) and field (CrUX) performance data
- Output structured reports in CSV, JSON, and HTML formats
- Support concurrent processing with rate limiting
- Zero-setup experience via
uv runwith PEP 723 inline metadata
Single-file CLI tool. No web server, no database, no persistent state beyond output files and optional TOML config.
The tool is a single pagespeed_insights_tool.py file with inline dependency metadata:
# /// script
# requires-python = ">=3.13"
# dependencies = ["requests", "pandas"]
# ///Why: uv run pagespeed_insights_tool.py handles everything — resolves dependencies, creates a virtual environment, and runs the script. No pyproject.toml, no pip install, no project scaffolding. The tool can be shared as a single file.
Trade-off: All code in one file. This is acceptable at the current scale (~800 lines) but would warrant refactoring into a package if the tool grows significantly.
Why: Standard library, no extra dependency. The CLI has 5 subcommands with straightforward flags — argparse handles this well. Custom TrackingAction classes track which flags were explicitly set, enabling clean config-file merging.
Trade-off: More boilerplate than click/typer, but avoids adding a dependency for something the stdlib handles.
Why: The workload is I/O-bound (HTTP requests to Google's API). ThreadPoolExecutor is simpler to reason about and debug than asyncio, with comparable performance for this use case. A shared threading.Semaphore handles rate limiting.
Trade-off: Less efficient than asyncio for very high concurrency, but the API rate limits cap useful concurrency at ~4 workers anyway.
Why: Python 3.11+ includes tomllib in the standard library. TOML is human-readable, supports comments, and is the standard for Python project configuration. No extra dependency needed.
Why: Lab data (Lighthouse) gives consistent, reproducible synthetic results. Field data (CrUX) shows real-user experience. They're complementary — lab data catches regressions during development, field data validates real-world impact.
Why: Running the tool repeatedly (e.g., via cron) should never overwrite previous results. Timestamped filenames (20260216T143022Z-mobile.csv) accumulate safely and enable historical comparison via the compare subcommand.
URL Input API Layer Processing Output
───────── ───────── ────────── ──────
positional args ─┐ ┌─ CSV file
URL file (--file)─┼─ validate ─── URLs ──┐ │
stdin (pipe) ─┘ │ ┌─ extract_metrics ─┐ ├─ JSON file
├────┤ (per response) ├──┤
config file ──── load_config ── settings │ └──────────────────┘ ├─ HTML report
profile (--profile)── apply_profile ─────┘ │ │
│ ThreadPoolExecutor └─ Terminal table
│ + Semaphore rate limit
│ │
└── fetch_pagespeed_result
(with retry logic)
│
PageSpeed API v5
| Component | Responsibility |
|---|---|
CLI Parser (build_argument_parser) |
Defines subcommands and flags, tracks explicit arguments |
Config Loader (load_config, apply_profile) |
Reads TOML, merges settings/profiles/CLI with correct priority |
URL Handler (validate_url, load_urls) |
Normalizes URLs, reads from args/file/stdin |
API Client (fetch_pagespeed_result) |
HTTP requests with retry logic, error classification |
Metrics Parser (extract_metrics) |
Extracts lab + field data from API response into flat dict |
Batch Processor (process_urls) |
Orchestrates concurrent fetching with rate limiting |
CSV Formatter (output_csv) |
Flat table output via pandas |
JSON Formatter (output_json) |
Structured output with metadata envelope |
HTML Generator (generate_html_report) |
Self-contained dashboard with inline CSS/JS |
Subcommand Handlers (cmd_*) |
Wire inputs to processing to outputs for each workflow |
CLI flags (explicit)
│
▼
Profile values (--profile name)
│
▼
[settings] from config file
│
▼
Built-in defaults (hardcoded)
The TrackingAction argparse action records which flags were explicitly set on the command line. During config merging, only unset flags are filled from profile/settings/defaults.
GET https://www.googleapis.com/pagespeedonline/v5/runPagespeed
| Parameter | Value |
|---|---|
url |
Target URL to analyze |
strategy |
mobile or desktop |
category |
performance, accessibility, best-practices, seo (multiple allowed) |
key |
API key (optional) |
The API returns a large JSON object. The tool extracts from two main sections:
response
├── lighthouseResult
│ ├── categories.performance.score → performance_score (×100)
│ ├── audits.first-contentful-paint → lab_fcp_ms
│ ├── audits.largest-contentful-paint → lab_lcp_ms
│ ├── audits.cumulative-layout-shift → lab_cls
│ ├── audits.speed-index → lab_speed_index_ms
│ ├── audits.total-blocking-time → lab_tbt_ms
│ └── audits.interactive → lab_tti_ms
│
└── loadingExperience
└── metrics
├── FIRST_CONTENTFUL_PAINT_MS → field_fcp_ms + category
├── LARGEST_CONTENTFUL_PAINT_MS → field_lcp_ms + category
├── CUMULATIVE_LAYOUT_SHIFT_SCORE → field_cls + category
├── INTERACTION_TO_NEXT_PAINT → field_inp_ms + category
├── FIRST_INPUT_DELAY_MS → field_fid_ms + category
└── EXPERIMENTAL_TIME_TO_FIRST_BYTE → field_ttfb_ms + category
┌──────────────┐
Worker 1 ───acquire───▶ │ │
Worker 2 ───acquire───▶ │ Semaphore │ ──── only 1 request at a time
Worker 3 ───acquire───▶ │ (value=1) │ despite N workers
Worker 4 ───acquire───▶ │ │
└──────┬───────┘
│
delay between
requests (1.5s)
The semaphore ensures rate limiting even with multiple workers. Workers prepare results in parallel but serialize actual API calls.
| Status Code | Action |
|---|---|
| 200 | Success — return response |
| 429 | Rate limited — honor Retry-After header, exponential backoff |
| 500, 503 | Server error — exponential backoff (2s, 4s, 8s) |
| 4xx (other) | Permanent error — fail immediately |
| Network error | Retry with exponential backoff |
Maximum 3 retries per request. After exhaustion, the URL is recorded with an error and processing continues.
main thread
│
├── ThreadPoolExecutor(max_workers=N)
│ │
│ ├── Worker 1 ──┐
│ ├── Worker 2 ──┤
│ ├── Worker 3 ──┼── Semaphore(1) ── rate-limited API call
│ └── Worker 4 ──┘ │
│ │
│ delay enforcement
│ (time.monotonic)
│
├── Lock for progress counter
│
└── Collect results via as_completed()
- Workers prepare and wait for their turn via the semaphore
- Rate limiting is enforced by checking elapsed time since last request
- Progress is reported thread-safely via a lock and counter
- Error isolation: one URL failing doesn't affect others — the error is recorded in the result dict
When --workers 1 is specified, processing runs sequentially without ThreadPoolExecutor overhead.
One row per (URL, strategy) pair. Column order:
url, strategy, performance_score, accessibility_score, best_practices_score, seo_score,
lab_fcp_ms, lab_lcp_ms, lab_cls, lab_speed_index_ms, lab_tbt_ms, lab_tti_ms,
field_fcp_ms, field_fcp_category, field_lcp_ms, field_lcp_category, ...
fetch_time, error
Missing values are empty cells (pandas default).
{
"metadata": {
"generated_at": "ISO-8601 timestamp",
"total_urls": "integer",
"strategies": ["mobile", "desktop"],
"tool_version": "semver"
},
"results": [
{
"url": "string",
"strategy": "string",
"performance_score": "integer|null",
"lab_metrics": { "lab_fcp_ms": "number", ... },
"field_metrics": { "field_lcp_ms": "number", "field_lcp_category": "string", ... },
"fetch_time": "ISO-8601|null",
"error": "string|null"
}
]
}Lab and field metrics are nested for readability. The metadata envelope enables automated processing of report files.
The HTML report is self-contained — inline CSS and minimal inline JS, no external dependencies.
| Section | Implementation |
|---|---|
| Summary cards | CSS Grid with score-colored values |
| Score table | HTML table with color-coded cells via CSS classes |
| Bar charts | Pure CSS (width: N%) with colored backgrounds |
| CWV indicators | Pass/fail based on Google's published thresholds |
| Sortable columns | Inline JS — onclick on <th> elements, DOM reordering |
| Field data table | Separate table, only rendered when field data exists |
Add entries to LAB_METRICS or FIELD_METRICS lists:
LAB_METRICS.append(("new-audit-id", "lab_new_metric_ms"))
FIELD_METRICS.append(("NEW_API_KEY", "field_new_ms", "field_new_category"))Extraction happens automatically via the list-driven approach in extract_metrics().
- Create an
output_newformat(dataframe, output_path)function - Add the format to
VALID_OUTPUT_FORMATS - Add a branch in
cmd_audit()to call it
- Add a subparser in
build_argument_parser() - Create a
cmd_newcommand(args)handler - Add to the dispatch dict in
main()
Users can define any number of profiles in pagespeed.toml under [profiles.name]. Each profile can override any setting.
- Single-file constraint: All code is in one file (~800 lines). If the tool grows beyond ~1500 lines, consider splitting into a package with
src/layout. - No persistent storage: Reports are flat files. No database for querying historical trends.
- FID deprecation: First Input Delay is deprecated in favor of INP. The tool extracts both but FID may be removed from the API.
- Field data availability: CrUX data requires sufficient real-user traffic. Low-traffic sites will have empty field data columns.
- CrUX API integration: Dedicated CrUX API for field-only data without running Lighthouse
- Trend analysis:
trendsubcommand that loads multiple historical reports and charts score changes over time - Slack/email notifications: Alert on regressions exceeding a threshold
- GitHub Actions integration: Run as a CI check on deployment
- PDF export: Generate PDF reports from the HTML template