Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
3e101ad
Copy the queries from 2024 to 2025
IvanUkhov Jul 24, 2025
100a649
Update the readme
IvanUkhov Jul 24, 2025
bd194f2
Update the INCLUDE pseudo-directives
IvanUkhov Jul 31, 2025
c1bcf92
Replace all with crawl
IvanUkhov Jul 31, 2025
d44598b
Update the dates to 2025-07-01
IvanUkhov Jul 31, 2025
89ec842
Update the usage of custom_metrics
IvanUkhov Jul 31, 2025
51975f2
Update the usage of summary
IvanUkhov Jul 31, 2025
18e5776
Update the usage of payload in the common functions
IvanUkhov Jul 31, 2025
bd63581
Update the usage of JSON_*
IvanUkhov Jul 31, 2025
702b4ea
Fix development/fonts_hinting
IvanUkhov Aug 2, 2025
7542b57
Add a Python script for validating the queries
IvanUkhov Aug 1, 2025
7fccfbe
Add development/styles_hyphens
IvanUkhov Aug 3, 2025
cabb64c
Replace all single dates with a placeholder
IvanUkhov Aug 3, 2025
565f37f
Replace all multiple dates with a placeholder
IvanUkhov Aug 3, 2025
a07049e
Replace 2025 with {year}
IvanUkhov Aug 3, 2025
eeabe04
Mention the parameters in the readme
IvanUkhov Aug 3, 2025
0a440cf
Remove performance/scripts_font_face.sql
IvanUkhov Aug 3, 2025
726077d
Introduce a precision parameter and round proportions
IvanUkhov Aug 3, 2025
8aa23c6
Sort by count, not proportion
IvanUkhov Aug 3, 2025
c23dea4
Add development/styles_text_wrap
IvanUkhov Aug 3, 2025
ee92a95
Do not print size if there is none
IvanUkhov Aug 3, 2025
2fb4d25
Create sheets for query results
IvanUkhov Aug 4, 2025
29a3ce8
Add a few comments
IvanUkhov Aug 4, 2025
a7e01f3
Name sheets by the question
IvanUkhov Aug 4, 2025
013368a
Populate the spreadsheet
IvanUkhov Aug 4, 2025
8c3bf11
Make a cosmetic adjustment
IvanUkhov Aug 4, 2025
cac0adc
Nullify NaNs
IvanUkhov Aug 4, 2025
a9bf33a
Address a lint
IvanUkhov Aug 4, 2025
fb5bcad
Exclude non-SQL files
IvanUkhov Aug 4, 2025
ffe8e6f
Add a parameter for controlling the number of workers
IvanUkhov Aug 4, 2025
8e3286f
Use SAFE.INT64 for respBodySize
IvanUkhov Aug 4, 2025
f96c378
Take the first line of the error
IvanUkhov Aug 4, 2025
a5e48f8
Cast file sizes to integers
IvanUkhov Aug 4, 2025
8874d28
Downsample in design/fonts_family_by_script.sql
IvanUkhov Aug 4, 2025
5fe0e2a
Fix a typo
IvanUkhov Aug 4, 2025
27e2ba0
Add rounding in design/fonts_metric.sql
IvanUkhov Aug 4, 2025
9d1efd6
Fix a typo
IvanUkhov Aug 4, 2025
e396c87
Fix the reporting of failures
IvanUkhov Aug 4, 2025
ee65748
Update the readme
IvanUkhov Aug 4, 2025
3787dd6
Update the usage of the Chrome UX report
IvanUkhov Aug 11, 2025
ae31d3a
Update the usage of parsed_css
IvanUkhov Aug 19, 2025
ee3a5f0
Use JSON instead of STRING in custom JavaScript functions
IvanUkhov Aug 19, 2025
dcb8044
Make a cosmetic adjustment
IvanUkhov Aug 19, 2025
14fd419
Remove JSON_QUERY in favor of direct indexing
IvanUkhov Aug 21, 2025
1e036ff
Simplify SCRIPTS
IvanUkhov Aug 22, 2025
e8f0482
Simplify HAS_EMOJI
IvanUkhov Aug 22, 2025
038a4ec
Do no use subsampling
IvanUkhov Aug 22, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions sql/2025/fonts/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.csv
92 changes: 78 additions & 14 deletions sql/2025/fonts/README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,84 @@
# 2025 Fonts queries
# Fonts

<!--
This directory contains all of the 2025 Fonts chapter queries.
## Resources

Each query should have a corresponding `metric_name.sql` file.
Note that readers are linked to this directory, so try to make the SQL file names descriptive for easy browsing.
* 📄 [Planning document]
* 📊 [Results sheet]
* 📝 [Chapter content]

Analysts: if helpful, you can use this README to give additional info about the queries.
-->
## Structure

## Resources
The queries are split by the section where they are used:

* `design/` is about foundries and families,
* `development/` is about tools and technologies, and
* `performance/` is about hosting and serving.

Each file name starts with one of the following prefixes indicating the primary subject of the corresponding analysis:

* `fonts_` is about font files,
* `pages_` is about HTML pages,
* `scripts_` is about JavaScript scripts, and
* `styles_` is about CSS style sheets.

The prefix is followed by the property studied given in singular, potentially extended one or several suffixes narrowing down the scope, as in `fonts_size_by_table.sql` and `pages_link_relation.sql`.

## Content

Each query starts with a preamble indicating the section, question, and normalization type, as illustrated below:

```sql
-- Section: Performance
-- Question: What is the distribution of the file size broken down by table?
-- Normalization: Pages
```

Many queries rely on temporary functions for convenience and clarity. The functions that appear in several queries are extracted into a common file called `common.sql`. Whenever any of the functions defined in `common.sql` is used by a query, the query has the following pseudo-directive at the top:

```sql
-- INCLUDE https://github.com/HTTPArchive/almanac.httparchive.org/blob/main/sql/{year}/fonts/common.sql
```

The pseudo-directive has to be replaced with the content of `common.sql` prior to executing the query in question.

In addition, queries generally have parameters, as in `@date`, so as to be able to run them for different configurations. The values for the parameters will have to be supplied upon execution.

All the above is taken take of automatically if the queries are executed using `execute.py`, which we discuss next.

## Execution

The queries can be executed using the `execute.py` script. The results are first saved in local CSV files sitting next to the SQL files and then uploaded to the spreadsheet. In the spreadsheet, for each query, a separate sheet is created and named after the question the query answers, which is given in its preamble. If the CSV file already exists, the corresponding query is not executed. If cell A1 is already populated, the corresponding sheet is not updated.

First, ensure that the Application Default Credentials authorization strategy is configured, and that the HTTP Archive project is used as the quota project:

```shell
gcloud auth application-default login \
--scopes https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/spreadsheets
gcloud auth application-default set-quota-project httparchive
```

Second, install the Python prerequisites for the script:

```shell
pip install -r requirements.txt
```

The script can be run for all or a subset of the queries as illustrated below:

```shell
python execute.py
python execute.py design/*.sql
python execute.py development/fonts_*.sql
```

By default, it operates in a dry-run mode: it does not run the queries but prints an estimate of the amount of data that would be processed by each query. To actually run the queries, pass the `--no-dry-run` option as follows:

- [📄 Planning doc][~google-doc]
- [📊 Results sheet][~google-sheets]
- [📝 Markdown file][~chapter-markdown]
```shell
python execute.py --no-dry-run
python execute.py --no-dry-run design/*.sql
python execute.py --no-dry-run development/fonts_*.sql
```

[~google-doc]: https://docs.google.com/document/d/1jVc0vgmAY_lBxryItRBguXxEq77mvbaQ3UpbTweUoSI/
[~google-sheets]: https://docs.google.com/spreadsheets/d/1otdu4p_CCI70B4FVzw6k02frStsPMrQoFu7jUim_0Bg/edit
[~chapter-markdown]: https://github.com/HTTPArchive/almanac.httparchive.org/tree/main/src/content/en/2025/fonts.md
[Planning document]: https://docs.google.com/document/d/1jVc0vgmAY_lBxryItRBguXxEq77mvbaQ3UpbTweUoSI
[Results sheet]: https://docs.google.com/spreadsheets/d/1otdu4p_CCI70B4FVzw6k02frStsPMrQoFu7jUim_0Bg
[Chapter content]: https://github.com/HTTPArchive/almanac.httparchive.org/tree/main/src/content/en/2025/fonts.md
149 changes: 149 additions & 0 deletions sql/2025/fonts/common.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
-- Normalize a family name. Used in FAMILY_INNER.
CREATE TEMPORARY FUNCTION FAMILY_INNER_INNER(name STRING) AS (
CASE
WHEN REGEXP_CONTAINS(name, r'(?i)font\s?awesome') THEN 'Font Awesome'
ELSE IF(LENGTH(TRIM(name)) < 3, NULL, NULLIF(TRIM(name), ''))
END
);

-- Normalize a family name. Used in FAMILY.
CREATE TEMPORARY FUNCTION FAMILY_INNER(name STRING) AS (
FAMILY_INNER_INNER(
REGEXP_REPLACE(
name,
r'(?i)([\s-]?(black|bold|book|cond(ensed)?|demi|ex(tra)?|heavy|italic|light|medium|narrow|regular|semi|thin|ultra|wide|\d00|\d+pt))+$',
''
)
)
);

-- Extract the family name from a payload.
CREATE TEMPORARY FUNCTION FAMILY(payload JSON) AS (
FAMILY_INNER(
COALESCE(
STRING(payload._font_details.names[16]),
STRING(payload._font_details.names[1])
)
)
);

-- Extract the file format from an extension and a MIME type.
CREATE TEMPORARY FUNCTION FILE_FORMAT(extension STRING, type STRING) AS (
LOWER(IFNULL(REGEXP_EXTRACT(type, '/(?:x-)?(?:font-)?(.*)'), extension))
);

-- Normalize a foundry name. Used in FOUNDRY.
CREATE TEMPORARY FUNCTION FOUNDRY_INNER(name STRING) AS (
CASE UPPER(name)
WHEN 'ADBO' THEN 'ADBE'
WHEN 'PFED' THEN 'AWSM'
ELSE NULLIF(TRIM(REGEXP_REPLACE(name, r'[[:cntrl:]]+', '')), '')
END
);

-- Extract the foundry name from a payload.
CREATE TEMPORARY FUNCTION FOUNDRY(payload JSON) AS (
FOUNDRY_INNER(STRING(payload._font_details.OS2.achVendID))
);

-- Infer scripts from codepoints. Used in SCRIPTS.
CREATE TEMPORARY FUNCTION SCRIPTS_INNER(codepoints JSON)
RETURNS ARRAY<STRING>
LANGUAGE js
OPTIONS (library = ["gs://httparchive/lib/text-utils.js"])
AS r"""
if (codepoints && codepoints.length) {
return detectWritingScript(codepoints.map((character) => parseInt(character, 10)), 0.05);
} else {
return [];
}
""";

-- Infer scripts from a payload.
CREATE TEMPORARY FUNCTION SCRIPTS(payload JSON) AS (
SCRIPTS_INNER(payload._font_details.cmap.codepoints)
);

-- Infer the service from a URL.
CREATE TEMPORARY FUNCTION SERVICE(url STRING) AS (
CASE
WHEN REGEXP_CONTAINS(url, r'(fonts|use)\.typekit\.(net|com)') THEN 'Adobe'
WHEN REGEXP_CONTAINS(url, r'cloud\.typenetwork\.com') THEN 'typenetwork.com'
WHEN REGEXP_CONTAINS(url, r'cloud\.typography\.com') THEN 'typography.com'
WHEN REGEXP_CONTAINS(url, r'cloud\.webtype\.com') THEN 'webtype.com'
WHEN REGEXP_CONTAINS(url, r'f\.fontdeck\.com') THEN 'fontdeck.com'
WHEN REGEXP_CONTAINS(url, r'fast\.fonts\.(com|net)\/(jsapi|cssapi)') THEN 'fonts.com'
WHEN REGEXP_CONTAINS(url, r'fnt\.webink\.com') THEN 'webink.com'
WHEN REGEXP_CONTAINS(url, r'fontawesome\.com') THEN 'fontawesome.com'
WHEN REGEXP_CONTAINS(url, r'fonts\.(gstatic|googleapis)\.com|themes.googleusercontent.com/static/fonts|ssl.gstatic.com/fonts') THEN 'Google'
WHEN REGEXP_CONTAINS(url, r'fonts\.typonine\.com') THEN 'typonine.com'
WHEN REGEXP_CONTAINS(url, r'fonts\.typotheque\.com') THEN 'typotheque.com'
WHEN REGEXP_CONTAINS(url, r'kernest\.com') THEN 'kernest.com'
WHEN REGEXP_CONTAINS(url, r'typefront\.com') THEN 'typefront.com'
WHEN REGEXP_CONTAINS(url, r'typesquare\.com') THEN 'typesquare.com'
WHEN REGEXP_CONTAINS(url, r'use\.edgefonts\.net|webfonts\.creativecloud\.com') THEN 'edgefonts.net'
WHEN REGEXP_CONTAINS(url, r'webfont\.fontplus\.jp') THEN 'fontplus.jp'
WHEN REGEXP_CONTAINS(url, r'webfonts\.fontslive\.com') THEN 'fontslive.com'
WHEN REGEXP_CONTAINS(url, r'webfonts\.fontstand\.com') THEN 'fontstand.com'
WHEN REGEXP_CONTAINS(url, r'webfonts\.justanotherfoundry\.com') THEN 'justanotherfoundry.com'
ELSE 'self-hosted'
END
);

-- Extract the color formats from a formats payload and remove spurious entries
-- via a table-sizes payload.
--
-- When nonempty, it is expected that
--
-- * `CBDT` is larger than 2 + 2 bytes,
-- * `COLR` is larger than 2 + 2 + 4 + 4 + 2 (+ 4 + 4 + 4 + 4 + 4) bytes,
-- * `SVG ` is larger than 2 + 4 + 4 + 2 bytes, and
-- * `sbix` is larger than 2 + 2 + 4 + 4 bytes.
--
-- For simplicity, the threshold is set to 50 bytes.
CREATE TEMPORARY FUNCTION COLOR_FORMATS_INNER(formats JSON, table_sizes JSON)
RETURNS ARRAY<STRING>
LANGUAGE js AS '''
try {
return formats.filter((format) => {
const table = `${format} `.slice(0, 4);
return table_sizes[table] > 50;
});
} catch (e) {
return [];
}
''';

-- Extract the color formats from a payload.
CREATE TEMPORARY FUNCTION COLOR_FORMATS(payload JSON) AS (
COLOR_FORMATS_INNER(
payload._font_details.color.formats,
payload._font_details.table_sizes
)
);

-- Check if the font is a color font given its payload.
CREATE TEMPORARY FUNCTION IS_COLOR(payload JSON) AS (
ARRAY_LENGTH(COLOR_FORMATS(payload)) > 0
);

-- Check if the font was successfully parsed given its payload.
CREATE TEMPORARY FUNCTION IS_PARSED(payload JSON) AS (
payload._font_details.table_sizes IS NOT NULL
);

-- Check if the font is a variable font given its payload.
CREATE TEMPORARY FUNCTION IS_VARIABLE(payload JSON) AS (
REGEXP_CONTAINS(
TO_JSON_STRING(payload._font_details.table_sizes),
'(?i)gvar|CFF2'
)
);

-- Extract the variable formats from a payload.
CREATE TEMPORARY FUNCTION VARIABLE_FORMATS(payload JSON) AS (
REGEXP_EXTRACT_ALL(
TO_JSON_STRING(payload._font_details.table_sizes),
'(?i)glyf|CFF2'
)
);
54 changes: 54 additions & 0 deletions sql/2025/fonts/design/fonts_designer.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
-- Section: Design
-- Question: Which designers are popular?
-- Normalization: Pages

-- INCLUDE https://github.com/HTTPArchive/almanac.httparchive.org/blob/main/sql/{year}/fonts/common.sql

WITH
designers AS (
SELECT
client,
NULLIF(TRIM(STRING(payload._font_details.names[9])), '') AS designer,
COUNT(DISTINCT page) AS count,
ROW_NUMBER() OVER (PARTITION BY client ORDER BY COUNT(DISTINCT page) DESC) AS rank
FROM
`httparchive.crawl.requests`
WHERE
date = @date AND
type = 'font' AND
is_root_page AND
IS_PARSED(payload)
GROUP BY
client,
designer
QUALIFY
rank <= 100
),

pages AS (
SELECT
client,
COUNT(DISTINCT page) AS total
FROM
`httparchive.crawl.requests`
WHERE
date = @date AND
is_root_page
GROUP BY
client
)

SELECT
client,
designer,
count,
total,
ROUND(count / total, @precision) AS proportion
FROM
designers
JOIN
pages
USING (client)
ORDER BY
client,
count DESC
42 changes: 42 additions & 0 deletions sql/2025/fonts/design/fonts_family_by_foundry.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
-- Section: Design
-- Question: Which families are used broken down by foundry?
-- Normalization: Requests (parsed only)

-- INCLUDE https://github.com/HTTPArchive/almanac.httparchive.org/blob/main/sql/{year}/fonts/common.sql

WITH
requests AS (
SELECT
client,
FOUNDRY(payload) AS foundry,
FAMILY(payload) AS family,
COUNT(0) OVER (PARTITION BY client) AS total
FROM
`httparchive.crawl.requests`
WHERE
date = @date AND
type = 'font' AND
IS_PARSED(payload) AND
is_root_page
)

SELECT
client,
foundry,
family,
COUNT(0) AS count,
total,
ROUND(COUNT(0) / total, @precision) AS proportion,
ROW_NUMBER() OVER (PARTITION BY client ORDER BY COUNT(0) DESC) AS rank
FROM
requests
GROUP BY
client,
foundry,
family,
total
QUALIFY
rank <= 100
ORDER BY
client,
count DESC
46 changes: 46 additions & 0 deletions sql/2025/fonts/design/fonts_family_by_script.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
-- Section: Design
-- Question: Which families are used broken down by script?
-- Normalization: Requests (parsed only)

-- INCLUDE https://github.com/HTTPArchive/almanac.httparchive.org/blob/main/sql/{year}/fonts/common.sql

WITH
requests AS (
SELECT
client,
SCRIPTS(payload) AS scripts,
FAMILY(payload) AS family,
COUNT(0) OVER (PARTITION BY client) AS total
FROM
`httparchive.crawl.requests`
WHERE
date = @date AND
type = 'font' AND
is_root_page AND
IS_PARSED(payload)
)

SELECT
client,
script,
family,
COUNT(0) AS count,
total AS total,
ROUND(COUNT(0) / total, @precision) AS proportion,
ROW_NUMBER() OVER (PARTITION BY client, script ORDER BY COUNT(0) DESC) AS rank
FROM
requests,
UNNEST(scripts) AS script
WHERE
family != 'Adobe Blank'
GROUP BY
client,
script,
family,
requests.total
QUALIFY
rank <= 10
ORDER BY
client,
script,
count DESC
Loading
Loading