chore: refactor the logql benchmark suite#20845
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors the LogQL benchmark suite to use a more flexible query registry system instead of hard-coded query generation. The changes enable running different benchmark suites (fast, regression, exhaustive) with queries defined in YAML files that are templated against actual dataset metadata.
Changes:
- Introduces a query registry system with YAML-based query definitions organized into three suites: fast (local sanity checks), regression (CI), and exhaustive (nightly runs)
- Adds dataset metadata generation and loading to support query templating with actual stream selectors, fields, and labels
- Replaces hard-coded
TestCaseGeneratorwith a flexibleQueryRegistryandMetadataVariableResolversystem - Refactors generator to track stream formats and adds metrics unregistration to avoid test conflicts
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/logql/bench/testcase.go | Moves TestCase struct definition from deleted generator_query.go file |
| pkg/logql/bench/query_registry.go | Implements query registry for loading and expanding YAML query definitions |
| pkg/logql/bench/metadata.go | Builds and manages dataset metadata for query templating |
| pkg/logql/bench/metadata_resolver.go | Resolves query variables (${SELECTOR}, etc.) based on dataset metadata |
| pkg/logql/bench/metadata_test.go | Tests for metadata building, saving, and loading |
| pkg/logql/bench/metadata_resolver_test.go | Tests for variable resolution logic |
| pkg/logql/bench/integration_metadata_test.go | Integration tests for metadata generation workflow |
| pkg/logql/bench/store_chunk.go | Adds registerer parameter and metrics unregistration |
| pkg/logql/bench/store.go | Integrates metadata generation into data builder |
| pkg/logql/bench/generator.go | Renames Application to Service, adds Format field to StreamMetadata |
| pkg/logql/bench/faker.go | Adds LogFormat enum and Format field to Service definitions |
| pkg/logql/bench/bench_test.go | Updates tests to use query registry instead of hard-coded generator |
| pkg/logql/bench/cmd/bench/main.go | Updates CLI to load and expand queries from registry |
| pkg/logql/bench/queries/schema.json | JSON schema for validating query YAML files |
| pkg/logql/bench/queries/*/**.yaml | YAML files defining queries for different test suites |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
2818f7a to
e02f77d
Compare
| - "pkg/compute/**" | ||
| - "pkg/dataobj/**" | ||
| - "pkg/engine/**" | ||
| - "pkg/logql/bench/**" |
There was a problem hiding this comment.
this was all I added, but I also alphabetized them
The test jobs were checking out the latest commit on the branch while trying to download artifacts generated from an older commit (when using workflow_dispatch with a specific ref). This caused artifact not found errors. Fix by: - Adding commit SHA as output from generate-testdata job - Using that commit SHA in test jobs for both checkout and artifact names - Ensures all jobs use the exact same commit SHA
TestPrintBenchmarkQueries calls loadTestCases which requires dataset metadata that's only generated with the -slow-tests flag. Without the guard, the test fails in CI when the metadata file doesn't exist. This aligns it with other tests like TestStorageEquality that also require generated data.
The previous fix added a slow test guard, but this test should be fast and not require the generated dataset file. Instead, generate metadata in memory by creating a Generator and calling generateStreamMetadata(). This restores the original behavior where the test runs quickly without needing the data/dataset_metadata.json file, while still using the new query registry infrastructure. Changes: - Add GenerateInMemoryMetadata() helper to create metadata without file I/O - Update TestPrintBenchmarkQueries to generate metadata in-memory - Remove slow test guard since test is now fast again
| // Bounded sets: characteristics common to all datasets | ||
| // These define the set of possible queries | ||
| var ( | ||
| // unwrappableFields are numeric fields that can be used with | unwrap | ||
| // Mapped from applications: | ||
| unwrappableFields = []string{ | ||
| "bytes", | ||
| "duration", | ||
| "rows_affected", | ||
| "size", | ||
| "spans", | ||
| "status", | ||
| "streams", | ||
| "ttl", | ||
| } | ||
|
|
||
| // filterableKeywords are strings that commonly appear in log content | ||
| // Used for line filter queries like |= "level" or |~ "error" | ||
| filterableKeywords = []string{ | ||
| "DEBUG", | ||
| "ERROR", | ||
| "INFO", | ||
| "WARN", | ||
| "debug", | ||
| "duration", | ||
| "error", | ||
| "failed", | ||
| "info", | ||
| "level", | ||
| "status", | ||
| "success", | ||
| "warn", | ||
| } | ||
|
|
||
| // structuredMetadataKeys are keys used in structured metadata | ||
| // These appear as structured_metadata in log entries | ||
| structuredMetadataKeys = []string{ | ||
| "detected_level", | ||
| "pod", | ||
| "span_id", | ||
| "trace_id", | ||
| } | ||
|
|
||
| // labelKeys are indexed or parsed labels | ||
| // Used for label selectors and/or grouping (by/without) | ||
| labelKeys = []string{ | ||
| "cluster", | ||
| "component", | ||
| "detected_level", | ||
| "env", | ||
| "level", | ||
| "namespace", | ||
| "pod", | ||
| "region", | ||
| "service_name", | ||
| } | ||
| ) |
There was a problem hiding this comment.
these variables are not used, do you want to keep them for info?
There was a problem hiding this comment.
ahh, thank you. I mean to circle back a round and validate the query requirements were in this set, done
| kind: metric | ||
| time_range: | ||
| length: 24h | ||
| step: 1m |
There was a problem hiding this comment.
Maybe not a part of this PR, but we need few test cases for metric queries with different combinations of step and interval values: step < interval, step == interval, step > interval. We have had bugs in there before and we did not catch them with our correctness tests.
There was a problem hiding this comment.
Specifically when query length (i.e. 24h) is not divided by step (i.e. 1m26s)
There was a problem hiding this comment.
let's do that in a followup
Adds extractSelectorFromQuery() to extract selectors from queries and falls back to default ranges (5m/1m) when selector resolution fails.
This reverts commit 38e5936.
Replaced all hardcoded service_name selectors (e.g., {service_name="loki"})
with ${SELECTOR} placeholder in queries that use ${RANGE}. This allows the
variable resolver to choose appropriate streams based on requirements.
Changes:
- Replaced 70+ hardcoded selectors with ${SELECTOR}
- Removed incorrect 'labels' requirements (level is parsed, not a stream label)
- Fixed drilldown queries to use 'structured_metadata' for detected_level
- Maintained all log_format and unwrappable_fields requirements
This is cleaner than parsing selectors from queries and properly uses the
variable resolution system.
What this PR does / why we need it:
This PR changes how queries are defined for the LogQL benchmarks. Instead of a hard-coded set of queries that we run using many combinations of selectors, this PR introduces a query registry, which is a set of 3 folders containing yaml files that define different sets of queries. These folders aim to separate the queries out into 3 different "suites",
fastfor a local sanity check,regressionfor CI, andexhaustivefor nightly runs against real data.To enable running these queries against real data, this PR introduces the concept of a metadata file for a dataset, and retrofits the data generator to produce one when generating fake data. The hope is that we can, in the future, write a tool to produce metadata against a snapshot of real data.
The metadata allows the queries to be templated with selectors, fields, structured metadata, and other properties that actually exist in the dataset, making sure the queries can run against different datasets.
Special notes for your reviewer:
Checklist
CONTRIBUTING.mdguide (required)featPRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.docs/sources/setup/upgrade/_index.mddeprecated-config.yamlanddeleted-config.yamlfiles respectively in thetools/deprecated-config-checkerdirectory. Example PR