chore: refactor the logql benchmark suite by trevorwhitney · Pull Request #20845 · grafana/loki

trevorwhitney · 2026-02-17T21:14:38Z

What this PR does / why we need it:

This PR changes how queries are defined for the LogQL benchmarks. Instead of a hard-coded set of queries that we run using many combinations of selectors, this PR introduces a query registry, which is a set of 3 folders containing yaml files that define different sets of queries. These folders aim to separate the queries out into 3 different "suites", fast for a local sanity check, regression for CI, and exhaustive for nightly runs against real data.

To enable running these queries against real data, this PR introduces the concept of a metadata file for a dataset, and retrofits the data generator to produce one when generating fake data. The hope is that we can, in the future, write a tool to produce metadata against a snapshot of real data.

The metadata allows the queries to be templated with selectors, fields, structured metadata, and other properties that actually exist in the dataset, making sure the queries can run against different datasets.

Special notes for your reviewer:

Checklist

Reviewed the CONTRIBUTING.md guide (required)
Documentation added
Tests updated
Title matches the required conventional commits format, see here
- Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

Copilot

Pull request overview

This PR refactors the LogQL benchmark suite to use a more flexible query registry system instead of hard-coded query generation. The changes enable running different benchmark suites (fast, regression, exhaustive) with queries defined in YAML files that are templated against actual dataset metadata.

Changes:

Introduces a query registry system with YAML-based query definitions organized into three suites: fast (local sanity checks), regression (CI), and exhaustive (nightly runs)
Adds dataset metadata generation and loading to support query templating with actual stream selectors, fields, and labels
Replaces hard-coded TestCaseGenerator with a flexible QueryRegistry and MetadataVariableResolver system
Refactors generator to track stream formats and adds metrics unregistration to avoid test conflicts

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
pkg/logql/bench/testcase.go	Moves TestCase struct definition from deleted generator_query.go file
pkg/logql/bench/query_registry.go	Implements query registry for loading and expanding YAML query definitions
pkg/logql/bench/metadata.go	Builds and manages dataset metadata for query templating
pkg/logql/bench/metadata_resolver.go	Resolves query variables (${SELECTOR}, etc.) based on dataset metadata
pkg/logql/bench/metadata_test.go	Tests for metadata building, saving, and loading
pkg/logql/bench/metadata_resolver_test.go	Tests for variable resolution logic
pkg/logql/bench/integration_metadata_test.go	Integration tests for metadata generation workflow
pkg/logql/bench/store_chunk.go	Adds registerer parameter and metrics unregistration
pkg/logql/bench/store.go	Integrates metadata generation into data builder
pkg/logql/bench/generator.go	Renames Application to Service, adds Format field to StreamMetadata
pkg/logql/bench/faker.go	Adds LogFormat enum and Format field to Service definitions
pkg/logql/bench/bench_test.go	Updates tests to use query registry instead of hard-coded generator
pkg/logql/bench/cmd/bench/main.go	Updates CLI to load and expand queries from registry
pkg/logql/bench/queries/schema.json	JSON schema for validating query YAML files
pkg/logql/bench/queries//*.yaml	YAML files defining queries for different test suites

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pkg/logql/bench/metadata.go

pkg/logql/bench/queries/exhaustive/aggregations.yaml

pkg/logql/bench/queries/regression/drilldown-patterns.yaml

trevorwhitney · 2026-02-17T22:42:08Z

.github/workflows/logql-correctness.yml

      - "pkg/compute/**"
+      - "pkg/dataobj/**"
+      - "pkg/engine/**"
+      - "pkg/logql/bench/**"


this was all I added, but I also alphabetized them

The test jobs were checking out the latest commit on the branch while trying to download artifacts generated from an older commit (when using workflow_dispatch with a specific ref). This caused artifact not found errors. Fix by: - Adding commit SHA as output from generate-testdata job - Using that commit SHA in test jobs for both checkout and artifact names - Ensures all jobs use the exact same commit SHA

TestPrintBenchmarkQueries calls loadTestCases which requires dataset metadata that's only generated with the -slow-tests flag. Without the guard, the test fails in CI when the metadata file doesn't exist. This aligns it with other tests like TestStorageEquality that also require generated data.

The previous fix added a slow test guard, but this test should be fast and not require the generated dataset file. Instead, generate metadata in memory by creating a Generator and calling generateStreamMetadata(). This restores the original behavior where the test runs quickly without needing the data/dataset_metadata.json file, while still using the new query registry infrastructure. Changes: - Add GenerateInMemoryMetadata() helper to create metadata without file I/O - Update TestPrintBenchmarkQueries to generate metadata in-memory - Remove slow test guard since test is now fast again

ivkalita · 2026-02-18T12:45:09Z

pkg/logql/bench/metadata.go

+// Bounded sets: characteristics common to all datasets
+// These define the set of possible queries
+var (
+	// unwrappableFields are numeric fields that can be used with | unwrap
+	// Mapped from applications:
+	unwrappableFields = []string{
+		"bytes",
+		"duration",
+		"rows_affected",
+		"size",
+		"spans",
+		"status",
+		"streams",
+		"ttl",
+	}
+
+	// filterableKeywords are strings that commonly appear in log content
+	// Used for line filter queries like |= "level" or |~ "error"
+	filterableKeywords = []string{
+		"DEBUG",
+		"ERROR",
+		"INFO",
+		"WARN",
+		"debug",
+		"duration",
+		"error",
+		"failed",
+		"info",
+		"level",
+		"status",
+		"success",
+		"warn",
+	}
+
+	// structuredMetadataKeys are keys used in structured metadata
+	// These appear as structured_metadata in log entries
+	structuredMetadataKeys = []string{
+		"detected_level",
+		"pod",
+		"span_id",
+		"trace_id",
+	}
+
+	// labelKeys are indexed or parsed labels
+	// Used for label selectors and/or grouping (by/without)
+	labelKeys = []string{
+		"cluster",
+		"component",
+		"detected_level",
+		"env",
+		"level",
+		"namespace",
+		"pod",
+		"region",
+		"service_name",
+	}
+)


these variables are not used, do you want to keep them for info?

ahh, thank you. I mean to circle back a round and validate the query requirements were in this set, done

spiridonov · 2026-02-18T14:30:10Z

pkg/logql/bench/queries/exhaustive/aggregations.yaml

+    kind: metric
+    time_range:
+      length: 24h
+      step: 1m


Maybe not a part of this PR, but we need few test cases for metric queries with different combinations of step and interval values: step < interval, step == interval, step > interval. We have had bugs in there before and we did not catch them with our correctness tests.

Specifically when query length (i.e. 24h) is not divided by step (i.e. 1m26s)

let's do that in a followup

Adds extractSelectorFromQuery() to extract selectors from queries and falls back to default ranges (5m/1m) when selector resolution fails.

This reverts commit 38e5936.

Replaced all hardcoded service_name selectors (e.g., {service_name="loki"}) with ${SELECTOR} placeholder in queries that use ${RANGE}. This allows the variable resolver to choose appropriate streams based on requirements. Changes: - Replaced 70+ hardcoded selectors with ${SELECTOR} - Removed incorrect 'labels' requirements (level is parsed, not a stream label) - Fixed drilldown queries to use 'structured_metadata' for detected_level - Maintained all log_format and unwrappable_fields requirements This is cleaner than parsing selectors from queries and properly uses the variable resolution system.

Copilot AI review requested due to automatic review settings February 17, 2026 21:14

trevorwhitney requested a review from a team as a code owner February 17, 2026 21:14

pull-request-size bot added the size/XXL label Feb 17, 2026

Copilot started reviewing on behalf of trevorwhitney February 17, 2026 21:15 View session

Copilot AI reviewed Feb 17, 2026

View reviewed changes

trevorwhitney added 2 commits February 17, 2026 15:37

chore: refactor the logql benchmark suite

6c215c4

chore: initial PR feedback

e02f77d

trevorwhitney force-pushed the twhitney/improve-logql-benchmarks branch from 2818f7a to e02f77d Compare February 17, 2026 22:37

ci: run correctness tests on correctness test changes

cb29a90

trevorwhitney commented Feb 17, 2026

View reviewed changes

trevorwhitney added 7 commits February 17, 2026 15:53

chore: clean up TODOs

4f9581a

chore: lint and format

f5111e0

test: verify that the query returns data

54a4bed

ci: bring back range type, use regression suite

5118d0c

ivkalita approved these changes Feb 18, 2026

View reviewed changes

spiridonov reviewed Feb 18, 2026

View reviewed changes

trevorwhitney added 8 commits February 18, 2026 13:33

chore: refactor tests to template in range value

cd7b0d7

chore: validate query requirements against bounded set

16c5140

test: make sure selector is deterministic

8943621

fix: range resolution for logql benchmark queries

8576c5d

fix: keyword resolution

cdf7e1a

fix: resolve RANGE variable for hardcoded selectors

38e5936

Adds extractSelectorFromQuery() to extract selectors from queries and falls back to default ranges (5m/1m) when selector resolution fails.

Revert "fix: resolve RANGE variable for hardcoded selectors"

681ac98

This reverts commit 38e5936.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: refactor the logql benchmark suite#20845

chore: refactor the logql benchmark suite#20845
trevorwhitney wants to merge 18 commits intomainfrom
twhitney/improve-logql-benchmarks

trevorwhitney commented Feb 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

trevorwhitney Feb 17, 2026

Uh oh!

ivkalita Feb 18, 2026

Uh oh!

trevorwhitney Feb 18, 2026

Uh oh!

spiridonov Feb 18, 2026

Uh oh!

spiridonov Feb 18, 2026

Uh oh!

trevorwhitney Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

trevorwhitney commented Feb 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

trevorwhitney Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

ivkalita Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

trevorwhitney Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

spiridonov Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

spiridonov Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

trevorwhitney Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments