Skip to content

chore: refactor the logql benchmark suite#20845

Open
trevorwhitney wants to merge 18 commits intomainfrom
twhitney/improve-logql-benchmarks
Open

chore: refactor the logql benchmark suite#20845
trevorwhitney wants to merge 18 commits intomainfrom
twhitney/improve-logql-benchmarks

Conversation

@trevorwhitney
Copy link
Collaborator

What this PR does / why we need it:

This PR changes how queries are defined for the LogQL benchmarks. Instead of a hard-coded set of queries that we run using many combinations of selectors, this PR introduces a query registry, which is a set of 3 folders containing yaml files that define different sets of queries. These folders aim to separate the queries out into 3 different "suites", fast for a local sanity check, regression for CI, and exhaustive for nightly runs against real data.

To enable running these queries against real data, this PR introduces the concept of a metadata file for a dataset, and retrofits the data generator to produce one when generating fake data. The hope is that we can, in the future, write a tool to produce metadata against a snapshot of real data.

The metadata allows the queries to be templated with selectors, fields, structured metadata, and other properties that actually exist in the dataset, making sure the queries can run against different datasets.

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

Copilot AI review requested due to automatic review settings February 17, 2026 21:14
@trevorwhitney trevorwhitney requested a review from a team as a code owner February 17, 2026 21:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the LogQL benchmark suite to use a more flexible query registry system instead of hard-coded query generation. The changes enable running different benchmark suites (fast, regression, exhaustive) with queries defined in YAML files that are templated against actual dataset metadata.

Changes:

  • Introduces a query registry system with YAML-based query definitions organized into three suites: fast (local sanity checks), regression (CI), and exhaustive (nightly runs)
  • Adds dataset metadata generation and loading to support query templating with actual stream selectors, fields, and labels
  • Replaces hard-coded TestCaseGenerator with a flexible QueryRegistry and MetadataVariableResolver system
  • Refactors generator to track stream formats and adds metrics unregistration to avoid test conflicts

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
pkg/logql/bench/testcase.go Moves TestCase struct definition from deleted generator_query.go file
pkg/logql/bench/query_registry.go Implements query registry for loading and expanding YAML query definitions
pkg/logql/bench/metadata.go Builds and manages dataset metadata for query templating
pkg/logql/bench/metadata_resolver.go Resolves query variables (${SELECTOR}, etc.) based on dataset metadata
pkg/logql/bench/metadata_test.go Tests for metadata building, saving, and loading
pkg/logql/bench/metadata_resolver_test.go Tests for variable resolution logic
pkg/logql/bench/integration_metadata_test.go Integration tests for metadata generation workflow
pkg/logql/bench/store_chunk.go Adds registerer parameter and metrics unregistration
pkg/logql/bench/store.go Integrates metadata generation into data builder
pkg/logql/bench/generator.go Renames Application to Service, adds Format field to StreamMetadata
pkg/logql/bench/faker.go Adds LogFormat enum and Format field to Service definitions
pkg/logql/bench/bench_test.go Updates tests to use query registry instead of hard-coded generator
pkg/logql/bench/cmd/bench/main.go Updates CLI to load and expand queries from registry
pkg/logql/bench/queries/schema.json JSON schema for validating query YAML files
pkg/logql/bench/queries/*/**.yaml YAML files defining queries for different test suites

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@trevorwhitney trevorwhitney force-pushed the twhitney/improve-logql-benchmarks branch from 2818f7a to e02f77d Compare February 17, 2026 22:37
- "pkg/compute/**"
- "pkg/dataobj/**"
- "pkg/engine/**"
- "pkg/logql/bench/**"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was all I added, but I also alphabetized them

The test jobs were checking out the latest commit on the branch while
trying to download artifacts generated from an older commit (when using
workflow_dispatch with a specific ref). This caused artifact not found
errors.

Fix by:
- Adding commit SHA as output from generate-testdata job
- Using that commit SHA in test jobs for both checkout and artifact names
- Ensures all jobs use the exact same commit SHA
TestPrintBenchmarkQueries calls loadTestCases which requires dataset
metadata that's only generated with the -slow-tests flag. Without the
guard, the test fails in CI when the metadata file doesn't exist.

This aligns it with other tests like TestStorageEquality that also
require generated data.
The previous fix added a slow test guard, but this test should be fast
and not require the generated dataset file. Instead, generate metadata
in memory by creating a Generator and calling generateStreamMetadata().

This restores the original behavior where the test runs quickly without
needing the data/dataset_metadata.json file, while still using the new
query registry infrastructure.

Changes:
- Add GenerateInMemoryMetadata() helper to create metadata without file I/O
- Update TestPrintBenchmarkQueries to generate metadata in-memory
- Remove slow test guard since test is now fast again
Comment on lines 20 to 76
// Bounded sets: characteristics common to all datasets
// These define the set of possible queries
var (
// unwrappableFields are numeric fields that can be used with | unwrap
// Mapped from applications:
unwrappableFields = []string{
"bytes",
"duration",
"rows_affected",
"size",
"spans",
"status",
"streams",
"ttl",
}

// filterableKeywords are strings that commonly appear in log content
// Used for line filter queries like |= "level" or |~ "error"
filterableKeywords = []string{
"DEBUG",
"ERROR",
"INFO",
"WARN",
"debug",
"duration",
"error",
"failed",
"info",
"level",
"status",
"success",
"warn",
}

// structuredMetadataKeys are keys used in structured metadata
// These appear as structured_metadata in log entries
structuredMetadataKeys = []string{
"detected_level",
"pod",
"span_id",
"trace_id",
}

// labelKeys are indexed or parsed labels
// Used for label selectors and/or grouping (by/without)
labelKeys = []string{
"cluster",
"component",
"detected_level",
"env",
"level",
"namespace",
"pod",
"region",
"service_name",
}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these variables are not used, do you want to keep them for info?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, thank you. I mean to circle back a round and validate the query requirements were in this set, done

kind: metric
time_range:
length: 24h
step: 1m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not a part of this PR, but we need few test cases for metric queries with different combinations of step and interval values: step < interval, step == interval, step > interval. We have had bugs in there before and we did not catch them with our correctness tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically when query length (i.e. 24h) is not divided by step (i.e. 1m26s)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's do that in a followup

Adds extractSelectorFromQuery() to extract selectors from queries and falls
back to default ranges (5m/1m) when selector resolution fails.
Replaced all hardcoded service_name selectors (e.g., {service_name="loki"})
with ${SELECTOR} placeholder in queries that use ${RANGE}. This allows the
variable resolver to choose appropriate streams based on requirements.

Changes:
- Replaced 70+ hardcoded selectors with ${SELECTOR}
- Removed incorrect 'labels' requirements (level is parsed, not a stream label)
- Fixed drilldown queries to use 'structured_metadata' for detected_level
- Maintained all log_format and unwrappable_fields requirements

This is cleaner than parsing selectors from queries and properly uses the
variable resolution system.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments