Fonts 2025 queries#4175
Conversation
|
@tunetheweb, I think you were the one reviewing the queries last year. If you do not mind, I would like to invite you to review this year, too, but please feel free to assign someone else. This year, we did not change anything. We just migrated the queries to |
|
(The linter is failing due to the code elsewhere.) |
Fixing in #4196 |
|
That's fixed in After that are you good to merge this? |
|
Thank you. Rebased. Well, I have not received any feedback from the lead. I would merge, if you are OK with potential follow-up PRs. |
Yeah lets do that. |
Makes progress on #4073
Fonts
Resources
Structure
The queries are split by the section where they are used:
design/is about foundries and families,development/is about tools and technologies, andperformance/is about hosting and serving.Each file name starts with one of the following prefixes indicating the primary subject of the corresponding analysis:
fonts_is about font files,pages_is about HTML pages,scripts_is about JavaScript scripts, andstyles_is about CSS style sheets.The prefix is followed by the property studied given in singular, potentially extended one or several suffixes narrowing down the scope, as in
fonts_size_by_table.sqlandpages_link_relation.sql.Content
Each query starts with a preamble indicating the section, question, and normalization type, as illustrated below:
Many queries rely on temporary functions for convenience and clarity. The functions that appear in several queries are extracted into a common file called
common.sql. Whenever any of the functions defined incommon.sqlis used by a query, the query has the following pseudo-directive at the top:-- INCLUDE https://github.com/HTTPArchive/almanac.httparchive.org/blob/main/sql/{year}/fonts/common.sqlThe pseudo-directive has to be replaced with the content of
common.sqlprior to executing the query in question.In addition, queries generally have parameters, as in
@date, so as to be able to run them for different configurations. The values for the parameters will have to be supplied upon execution.All the above is taken take of automatically if the queries are executed using
execute.py, which we discuss next.Execution
The queries can be executed using the
execute.pyscript. The results are first saved in local CSV files sitting next to the SQL files and then uploaded to the spreadsheet. In the spreadsheet, for each query, a separate sheet is created and named after the question the query answers, which is given in its preamble. If the CSV file already exists, the corresponding query is not executed. If cell A1 is already populated, the corresponding sheet is not updated.First, ensure that the Application Default Credentials authorization strategy is configured, and that the HTTP Archive project is used as the quota project:
Second, install the Python prerequisites for the script:
The script can be run for all or a subset of the queries as illustrated below:
By default, it operates in a dry-run mode: it does not run the queries but prints an estimate of the amount of data that would be processed by each query. To actually run the queries, pass the
--no-dry-runoption as follows: