Conversation
There was a problem hiding this comment.
Pull request overview
Adds new benchmark operations/challenges to measure ES|QL’s upcoming LIMIT ... BY behavior, including comparisons against an equivalent-ish DSL approach in the nyc_taxis track and scalability coverage in the esql track.
Changes:
- Add
LIMIT ... BYES|QL operations fornyc_taxis(segment vs doc partitioning) and corresponding challenge schedule entries. - Add
LIMIT ... BY key_1000ES|QL operations to theesqltrack and wire them into theesql_large_schemachallenge.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| nyc_taxis/operations/default.json | Adds DSL + ES |
| nyc_taxis/challenges/default.json | Adds the new LIMIT BY operations to the nyc_taxis challenge schedule. |
| esql/operations/default.json | Adds ES |
| esql/challenges/default.json | Runs the new LIMIT ... BY operations in the esql_large_schema challenge. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
nyc_taxis/operations/default.json
Outdated
| "size": 1000, | ||
| "sort": [{ "pickup_datetime": "desc" }], | ||
| "_source": false, | ||
| "fields": ["pickup_datetime", "dropoff_datetime", "trip_distance"] |
There was a problem hiding this comment.
Using collapse with inner_hits.size: 1000 returns the top collapsed hit plus up to 1000 inner hits per payment_type (i.e., up to 1001 docs/group), which isn’t directly comparable to LIMIT 1000 BY payment_type. Also, payment_type isn’t included in inner_hits.fields, so most returned hits won’t carry the grouping key. Consider adjusting inner_hits.size (e.g., 999) and/or aligning the returned fields so this query benchmark matches the ESQL benchmark’s cardinality/projection more closely.
| "size": 1000, | |
| "sort": [{ "pickup_datetime": "desc" }], | |
| "_source": false, | |
| "fields": ["pickup_datetime", "dropoff_datetime", "trip_distance"] | |
| "size": 999, | |
| "sort": [{ "pickup_datetime": "desc" }], | |
| "_source": false, | |
| "fields": ["pickup_datetime", "dropoff_datetime", "trip_distance", "payment_type"] |
Draft
To be merged after LIMIT BY feature is released (In tech preview): elastic/elasticsearch#112918
Added benchmarks similar to the existing "Large Schemas" (LIMIT) and TopN (SORT + LIMIT) ones.
Local benchmark results
LIMIT BY
SORT + LIMIT BY