Pushdown case function in aggregations as range queries #4400

yuancu · 2025-09-28T10:31:34Z

Description

This PR push down CASE functions used in aggregations as range queries.

For example, the query source=bank | eval age_range = case (age < 30, 'u30', age < 40, 'u40' else 'u100') | stats avg(balance) by age_range will be pushed down as the following OpenSearch DSL:

{
  "aggregations": {
    "age_range": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "key": "u30",
            "to": 30
          },
          {
            "key": "u40",
            "from": 30,
            "to": 40
          },
          {
            "key": "u100",
            "from": 40
          }
        ],
        "keyed": true
      },
      "aggregations": {
        "avg(balance)": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

A CASE function used in aggregation can be pushed down only if it satisfied the following criteria:

Result expressions must be string literals
The field referenced in the condition must be the same
Field references must be numeric
Ranges must be closed-open intervals: $[a, b)$, $[a, +inf)$, or $(-inf, b)$.
No further operations can be performed on the result of the case function

Limitations:

It only handle cases where the result expression is a string literal. E.g. case(balance<10, 'poor' else 'rich') will be pushed down, while case(balance<10, 10 else 100) won't.
Red flag: range query will ignore null values. E.g. eval b = case(balance<10, 'poor' else 'rich') | stats avg(balance) by b will not properly handle cases when there are balance with null values. For case function, null values be categorized into the else group; while with pushed-down aggregation, rows with null balance will be ignored.
With case function, the default else group is null. However, since null can not be a key for a range query, we substitute it with "null". This can be fixed later by assigning a secret key to the else group, and substituting it later when parsing the response.

Examples of generated DSL

Case 1: Group by the case field only, with sub-aggregations

source=bank | eval age_range = case (age < 30, 'u30', age < 40, 'u40' else 'u100') | stats avg(balance) by age_range

RangeAgg
    MetricAgg

{
  "aggregations": {
    "age_range": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "key": "u30",
            "to": 30
          },
          {
            "key": "u40",
            "from": 30,
            "to": 40
          },
          {
            "key": "u100",
            "from": 40
          }
        ],
        "keyed": true
      },
      "aggregations": {
        "avg(balance)": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

Case 2: Group by multiple ranges with sub-aggregations

source=bank | eval age_range = case (age < 30, 'u30', age < 35, 'u35', age < 40, 'u40', age >= 40, 'u100'), balance_range = case(balance < 20000, 'medium' else 'high') | stats avg(balance) by age_range, balance_range

RangeAgg
    RangeAgg
        MetricAgg

{
  "aggregations": {
    "age_range": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "key": "u30",
            "to": 30
          },
          {
            "key": "u35",
            "from": 30,
            "to": 35
          },
          {
            "key": "u40",
            "from": 35,
            "to": 40
          },
          {
            "key": "u100",
            "from": 40
          }
        ],
        "keyed": true
      },
      "aggregations": {
        "balance_range": {
          "range": {
            "field": "balance",
            "ranges": [
              {
                "key": "medium",
                "to": 20000
              },
              {
                "key": "high",
                "from": 20000
              }
            ],
            "keyed": true
          },
          "aggregations": {
            "avg(balance)": {
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      }
    }
  }
}

Case 3: Group by case field and keyword field

source=bank | eval age_range = case (age < 30, 'u30', age < 35, 'u35', age < 40, 'u40', age >= 40, 'u100') | stats avg(balance), count() by firstname, lastname, age_range

CompositeAgg
    RangeAgg
        MetricAgg

{
  "aggregations": {
    "composite_buckets": {
      "composite": {
        "size": 1000,
        "sources": [
          {
            "firstname": {
              "terms": {
                "field": "firstname",
                "missing_bucket": true,
                "missing_order": "first",
                "order": "asc"
              }
            }
          },
          {
            "lastname": {
              "terms": {
                "field": "lastname",
                "missing_bucket": true,
                "missing_order": "first",
                "order": "asc"
              }
            }
          }
        ]
      },
      "aggregations": {
        "age_range": {
          "range": {
            "field": "age",
            "ranges": [
              {
                "key": "u30",
                "to": 30
              },
              {
                "key": "u35",
                "from": 30,
                "to": 35
              },
              {
                "key": "u40",
                "from": 35,
                "to": 40
              },
              {
                "key": "u100",
                "from": 40
              }
            ],
            "keyed": true
          },
          "aggregations": {
            "avg(balance)": {
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      }
    }
  }
}

TODOs:

~~Fix the discrepancy of null as expression results (in the pushed down version, it is "null" instead of null)~~ left as a limitation.
Test cases where the result expressions are numbers
Test cases where there contains null values
Javadoc for newly added interfaces
Refactor bucket parsers, etc once Fallback to sub-aggregation if composite aggregation doesn't support #4413 is merged
Unify how sub-aggregations are created across AutoDataHistogramAggregation, RangeAggregation and CompositeAggregation

Related Issues

Resolves #4201 , partially resolves #4338

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Yuanchun Shen <[email protected]>

…gregations Signed-off-by: Yuanchun Shen <[email protected]>

Signed-off-by: Yuanchun Shen <[email protected]>

…rser Signed-off-by: Yuanchun Shen <[email protected]>

Signed-off-by: Yuanchun Shen <[email protected]>

- Additionally test number as result expressions Signed-off-by: Yuanchun Shen <[email protected]>

Signed-off-by: Yuanchun Shen <[email protected]>

qianheng-aws · 2025-10-13T03:16:31Z

opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java

          return Pair.of(
              Collections.singletonList(aggregationBuilder),
-              new CompositeAggregationParser(metricParserList, countAggNames));
+              new BucketAggregationParser(metricParsers, countAggNames));


Does it mean we don't need CompositeAggregationParser anymore?

Yes, I'm thinking of retiring CompositeAggregationParser since a composite aggregation is essentially also a bucket aggregation. By keeping only CompositeAggregationParser, we can avoid duplicating codes to handle subaggregations,etc.

qianheng-aws · 2025-10-13T03:20:20Z

opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java

              Collections.singletonList(aggregationBuilder),
-              new CompositeAggregationParser(metricParserList, countAggNames));
+              new BucketAggregationParser(metricParsers, countAggNames));
        } catch (CompositeAggUnSupportedException e) {


Is there any logic for handling range bucket when this exception happened? This part of code will construct composite agg unsupported bucket like auto_date_histogram. I'm wondering what will happens if both having auto_span + range_bucket in our query.

Please add test case like bin timestamp bins=xxx | eval age_range = case ... | stats count() by timestamp, age_range

I haven't handled the case where auto date histogram and range bucket coexists. Currently, it will treat age_range as a sub-term aggregation with script.

explain results for `source=time_test | bin timestamp bins=3 | eval value_range = case(value < 7000, 'small' else 'great') | stats bucket_nullable=false avg(value) by timestamp, value_range`

{ "calcite": { "logical": """LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT]) LogicalProject(avg(value)=[$2], timestamp=[$0], value_range=[$1]) LogicalAggregate(group=[{0, 1}], avg(value)=[AVG($2)]) LogicalProject(timestamp=[$9], value_range=[$10], value=[$2]) LogicalFilter(condition=[IS NOT NULL($9)]) LogicalProject(@timestamp=[$0], category=[$1], value=[$2], _id=[$4], _index=[$5], _score=[$6], _maxscore=[$7], _sort=[$8], _routing=[$9], timestamp=[WIDTH_BUCKET($3, 3, -(MAX($3) OVER (), MIN($3) OVER ()), MAX($3) OVER ())], value_range=[CASE(<($2, 7000), 'small':VARCHAR, 'great':VARCHAR)]) CalciteLogicalIndexScan(table=[[OpenSearch, time_test]]) """, "physical": """EnumerableLimit(fetch=[10000]) CalciteEnumerableIndexScan(table=[[OpenSearch, time_test]], PushDownContext=[[AGGREGATION->rel#2751:LogicalAggregate.NONE.[](input=RelSubset#2693,group={1, 2},avg(value)=AVG($0)), PROJECT->[avg(value), timestamp, value_range]], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"timestamp":{"auto_date_histogram":{"field":"timestamp","buckets":3,"minimum_interval":null},"aggregations":{"value_range":{"terms":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXNyABFqYXZhLnV0aWwuQ29sbFNlcleOq7Y6G6gRAwABSQADdGFneHAAAAADdwQAAAAGdAAHcm93VHlwZXQERHsKICAiZmllbGRzIjogWwogICAgewogICAgICAidWR0IjogIkVYUFJfVElNRVNUQU1QIiwKICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICJudWxsYWJsZSI6IHRydWUsCiAgICAgICJwcmVjaXNpb24iOiAtMSwKICAgICAgIm5hbWUiOiAiQHRpbWVzdGFtcCIKICAgIH0sCiAgICB7CiAgICAgICJ0eXBlIjogIlZBUkNIQVIiLAogICAgICAibnVsbGFibGUiOiB0cnVlLAogICAgICAicHJlY2lzaW9uIjogLTEsCiAgICAgICJuYW1lIjogImNhdGVnb3J5IgogICAgfSwKICAgIHsKICAgICAgInR5cGUiOiAiSU5URUdFUiIsCiAgICAgICJudWxsYWJsZSI6IHRydWUsCiAgICAgICJuYW1lIjogInZhbHVlIgogICAgfSwKICAgIHsKICAgICAgInVkdCI6ICJFWFBSX1RJTUVTVEFNUCIsCiAgICAgICJ0eXBlIjogIlZBUkNIQVIiLAogICAgICAibnVsbGFibGUiOiB0cnVlLAogICAgICAicHJlY2lzaW9uIjogLTEsCiAgICAgICJuYW1lIjogInRpbWVzdGFtcCIKICAgIH0sCiAgICB7CiAgICAgICJ0eXBlIjogIlZBUkNIQVIiLAogICAgICAibnVsbGFibGUiOiB0cnVlLAogICAgICAicHJlY2lzaW9uIjogLTEsCiAgICAgICJuYW1lIjogIl9pZCIKICAgIH0sCiAgICB7CiAgICAgICJ0eXBlIjogIlZBUkNIQVIiLAogICAgICAibnVsbGFibGUiOiB0cnVlLAogICAgICAicHJlY2lzaW9uIjogLTEsCiAgICAgICJuYW1lIjogIl9pbmRleCIKICAgIH0sCiAgICB7CiAgICAgICJ0eXBlIjogIlJFQUwiLAogICAgICAibnVsbGFibGUiOiB0cnVlLAogICAgICAibmFtZSI6ICJfc2NvcmUiCiAgICB9LAogICAgewogICAgICAidHlwZSI6ICJSRUFMIiwKICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgIm5hbWUiOiAiX21heHNjb3JlIgogICAgfSwKICAgIHsKICAgICAgInR5cGUiOiAiQklHSU5UIiwKICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgIm5hbWUiOiAiX3NvcnQiCiAgICB9LAogICAgewogICAgICAidHlwZSI6ICJWQVJDSEFSIiwKICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgInByZWNpc2lvbiI6IC0xLAogICAgICAibmFtZSI6ICJfcm91dGluZyIKICAgIH0KICBdLAogICJudWxsYWJsZSI6IHRydWUKfXQABGV4cHJ0Atp7CiAgIm9wIjogewogICAgIm5hbWUiOiAiQ0FTRSIsCiAgICAia2luZCI6ICJDQVNFIiwKICAgICJzeW50YXgiOiAiU1BFQ0lBTCIKICB9LAogICJvcGVyYW5kcyI6IFsKICAgIHsKICAgICAgIm9wIjogewogICAgICAgICJuYW1lIjogIjwiLAogICAgICAgICJraW5kIjogIkxFU1NfVEhBTiIsCiAgICAgICAgInN5bnRheCI6ICJCSU5BUlkiCiAgICAgIH0sCiAgICAgICJvcGVyYW5kcyI6IFsKICAgICAgICB7CiAgICAgICAgICAiaW5wdXQiOiAyLAogICAgICAgICAgIm5hbWUiOiAiJDIiCiAgICAgICAgfSwKICAgICAgICB7CiAgICAgICAgICAibGl0ZXJhbCI6IDcwMDAsCiAgICAgICAgICAidHlwZSI6IHsKICAgICAgICAgICAgInR5cGUiOiAiSU5URUdFUiIsCiAgICAgICAgICAgICJudWxsYWJsZSI6IGZhbHNlCiAgICAgICAgICB9CiAgICAgICAgfQogICAgICBdCiAgICB9LAogICAgewogICAgICAibGl0ZXJhbCI6ICJzbWFsbCIsCiAgICAgICJ0eXBlIjogewogICAgICAgICJ0eXBlIjogIlZBUkNIQVIiLAogICAgICAgICJudWxsYWJsZSI6IGZhbHNlLAogICAgICAgICJwcmVjaXNpb24iOiAtMQogICAgICB9CiAgICB9LAogICAgewogICAgICAibGl0ZXJhbCI6ICJncmVhdCIsCiAgICAgICJ0eXBlIjogewogICAgICAgICJ0eXBlIjogIlZBUkNIQVIiLAogICAgICAgICJudWxsYWJsZSI6IGZhbHNlLAogICAgICAgICJwcmVjaXNpb24iOiAtMQogICAgICB9CiAgICB9CiAgXQp9dAAKZmllbGRUeXBlc3NyABdqYXZhLnV0aWwuTGlua2VkSGFzaE1hcDTATlwQbMD7AgABWgALYWNjZXNzT3JkZXJ4cgARamF2YS51dGlsLkhhc2hNYXAFB9rBwxZg0QMAAkYACmxvYWRGYWN0b3JJAAl0aHJlc2hvbGR4cD9AAAAAAAAMdwgAAAAQAAAABHQACkB0aW1lc3RhbXBzcgA6b3JnLm9wZW5zZWFyY2guc3FsLm9wZW5zZWFyY2guZGF0YS50eXBlLk9wZW5TZWFyY2hEYXRlVHlwZZ4tUq4QfcqvAgABTAAHZm9ybWF0c3QAEExqYXZhL3V0aWwvTGlzdDt4cgA6b3JnLm9wZW5zZWFyY2guc3FsLm9wZW5zZWFyY2guZGF0YS50eXBlLk9wZW5TZWFyY2hEYXRhVHlwZcJjvMoC+gU1AgADTAAMZXhwckNvcmVUeXBldAArTG9yZy9vcGVuc2VhcmNoL3NxbC9kYXRhL3R5cGUvRXhwckNvcmVUeXBlO0wAC21hcHBpbmdUeXBldABITG9yZy9vcGVuc2VhcmNoL3NxbC9vcGVuc2VhcmNoL2RhdGEvdHlwZS9PcGVuU2VhcmNoRGF0YVR5cGUkTWFwcGluZ1R5cGU7TAAKcHJvcGVydGllc3QAD0xqYXZhL3V0aWwvTWFwO3hwfnIAKW9yZy5vcGVuc2VhcmNoLnNxbC5kYXRhLnR5cGUuRXhwckNvcmVUeXBlAAAAAAAAAAASAAB4cgAOamF2YS5sYW5nLkVudW0AAAAAAAAAABIAAHhwdAAJVElNRVNUQU1QfnIARm9yZy5vcGVuc2VhcmNoLnNxbC5vcGVuc2VhcmNoLmRhdGEudHlwZS5PcGVuU2VhcmNoRGF0YVR5cGUkTWFwcGluZ1R5cGUAAAAAAAAAABIAAHhxAH4AE3QABERhdGVzcgA8c2hhZGVkLmNvbS5nb29nbGUuY29tbW9uLmNvbGxlY3QuSW1tdXRhYmxlTWFwJFNlcmlhbGl6ZWRGb3JtAAAAAAAAAAACAAJMAARrZXlzdAASTGphdmEvbGFuZy9PYmplY3Q7TAAGdmFsdWVzcQB+ABp4cHVyABNbTGphdmEubGFuZy5PYmplY3Q7kM5YnxBzKWwCAAB4cAAAAAB1cQB+ABwAAAAAc3IAE2phdmEudXRpbC5BcnJheUxpc3R4gdIdmcdhnQMAAUkABHNpemV4cAAAAAF3BAAAAAF0ABdkYXRlX2hvdXJfbWludXRlX3NlY29uZHh0AAhjYXRlZ29yeX5xAH4AEnQABlNUUklOR3QABXZhbHVlfnEAfgASdAAHSU5URUdFUnQACXRpbWVzdGFtcHNxAH4AC3EAfgAUcQB+ABdxAH4AG3NxAH4AHwAAAAF3BAAAAAF0ABdkYXRlX2hvdXJfbWludXRlX3NlY29uZHh4AHg=\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp":1760342401776012000}},"size":1000,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":{"_key":"asc"}},"aggregations":{"avg(value)":{"avg":{"field":"value"}}}}}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)]) """ } }

I'll optimize this case and move the snippets of creating range buckets to createNestedBuckets

Fixed & test case added

qianheng-aws · 2025-10-13T03:23:18Z

opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java

      List<String> countAggNames = countAggNameAndBuilderPair.getLeft();

-      if (aggregate.getGroupSet().isEmpty()) {
+      Pair<Set<Integer>, AggregationBuilder> caseAggPushedAndRangeBuilder =


Why should we handle range bucket separately? Can it be handled in createNestedBuckets like how we construct auto_date_span?

Seems composite agg is more proper than constructing multiple sub-agg by createNestedBuckets. It looks good to me to keep the current implementation. #4400 (comment).

We should refactor auto_date_span to have similar implementation but also have to use createNestedBuckets if both have range_bucket and auto_date_span.

qianheng-aws · 2025-10-13T03:27:33Z

opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java

-      } else {
+            new BucketAggregationParser(metricParsers, countAggNames));
+      }
+      // It has both composite aggregation and range aggregation:


Not related to this PR but #4413. It seems we have chance to optimize sub-agg by combining part of buckets into composite bucket if they're supported by composite agg.

e.g. transform termBucket-termBucket-autoDateSpanBucket to compositeBucket - autoDateSpanBucket.

We can also do bucket reorder to scale this optimization to more cases

e.g. transform termBucket-autoDateSpanBucket-termBucket to compositeBucket - autoDateSpanBucket.

Fixed with the latest implementation

Signed-off-by: Yuanchun Shen <[email protected]>

LantaoJin · 2025-10-13T12:46:10Z

docs/user/ppl/functions/condition.rst

+Limitations
+>>>>>>>>>>>
+
+When each condition is a field comparison with a numeric literal and each result expression is a string literal, the query will be optimized as `range aggregations <https://docs.opensearch.org/latest/aggregations/bucket/range>`_ if pushdown optimization is enabled. However, this optimization has the following limitations:


IMO, it's not a limitation of case function, it is just a restricted optimization. We can just call out in what case, the case function would be optimized to range DSL. Can we add some optimizable case usages in user doc?

I think the stated conditions are in the scope of restricted optimizations, but the limitations are not because we will still do the optimization regardless of whether it has null values in its column or whether there is a default NULL range.

The problem is that there is no way to know in advance whether there exists null values in a column. Therefore, if we do this optimization, we always risk the discrepancy in results of with & without push-down.

LantaoJin · 2025-10-13T12:47:29Z

docs/user/ppl/functions/condition.rst

+- Null values will not be grouped into any bucket of a range aggregation and will be ignored
+- The default ELSE clause will use the string literal ``"null"`` instead of actual NULL values
+
+To avoid these edge-case limitations, set ``plugins.calcite.pushdown.enabled`` to false.


remove this

Signed-off-by: Yuanchun Shen <[email protected]>

qianheng-aws · 2025-10-15T08:05:24Z

opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java

+      // Push auto date span & case in group-by list into nested aggregations
+      Pair<Set<Integer>, AggregationBuilder> aggPushedAndAggBuilder =
+          createNestedAggregation(groupList, project, subBuilder, helper);
+      Set<Integer> aggPushed = aggPushedAndAggBuilder.getLeft();
+      AggregationBuilder pushedAggBuilder = aggPushedAndAggBuilder.getRight();
+      // The group-by list after removing pushed aggregations
+      groupList = groupList.stream().filter(i -> !aggPushed.contains(i)).toList();
+      if (pushedAggBuilder != null) {
+        subBuilder = new Builder().addAggregator(pushedAggBuilder);
+      }


[non-blocking] Should this part of code be put in the second branch of the below if for better readability? It should have structure like:

if (aggregate.getGroupSet().isEmpty()) { // no group by } else { // Push auto date span & case in group-by list into nested aggregations ... ... if (groupList.isEmpty()) { // No composite aggregation at top-level ... } else { // Composite aggregation at top level ... } }

It works well with current code but performing useless operations on some empty collections for no-group-by.

Thanks for the suggestion! optimized the logic here

qianheng-aws · 2025-10-15T08:15:11Z

...search/src/main/java/org/opensearch/sql/opensearch/response/agg/BucketAggregationParser.java

+      }
+    } else if (bucket instanceof Range.Bucket) {
+      // return null so that an empty range will be filtered out
+      if (bucket.getDocCount() == 0) {


Put this at the beginning of this method to skip meaningless operations in advance?

Please include InternalAutoDateHistogram.Bucket here as well to skip empty bucket for auto_date_span.

And also remove the code

sql/opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java

Lines 312 to 329 in 1e62fba

if (aggregationBuilder.getLeft().size() == 1

&& aggregationBuilder.getLeft().getFirst()

instanceof AutoDateHistogramAggregationBuilder autoDateHistogram) {

// If it's auto_date_histogram, filter the empty bucket by using the first aggregate metrics

RexBuilder rexBuilder = getCluster().getRexBuilder();

Optional<AggregationBuilder> aggBuilderOpt =

autoDateHistogram.getSubAggregations().stream().toList().stream().findFirst();

RexNode condition =

aggBuilderOpt.isEmpty() || aggBuilderOpt.get() instanceof ValueCountAggregationBuilder

? rexBuilder.makeCall(

SqlStdOperatorTable.GREATER_THAN,

rexBuilder.makeInputRef(newScan, 1),

rexBuilder.makeLiteral(

0, rexBuilder.getTypeFactory().createSqlType(SqlTypeName.INTEGER)))

: rexBuilder.makeCall(

SqlStdOperatorTable.IS_NOT_NULL, rexBuilder.makeInputRef(newScan, 1));

return LogicalFilter.create(newScan, condition);

}

here. Previously we add another filter to filter out the empty bucket.

Refactored as suggested

qianheng-aws · 2025-10-15T08:18:08Z

...search/src/main/java/org/opensearch/sql/opensearch/response/agg/BucketAggregationParser.java

+        return null;
+      }
+      // the content of the range bucket is extracted with `r.put(name, bucket.getKey())` below
+    }


Should there be another else here? Otherwise it will put the bucket.getKey into the results for composite agg as well.

Yep. I used to put all contents in bucket.getKey to pass unit tests in a wrong way; corrected the behavior now.

qianheng-aws · 2025-10-15T08:39:18Z

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLCaseFunctionIT.java

+        rows(39225.0, 1, "a30", "IL", "M"),
+        rows(48086.0, 1, "a30", "IN", "F"));
+
+    // 2.4 Composite (2 fields) - Range - Range - Metric (with count)


Please add test case for composite - auto_date_span - range - metric. Set bins to a big enough value like 100 to verify whether the empty buckets are filtered out properly.

Signed-off-by: Yuanchun Shen <[email protected]>

yuancu added 8 commits September 26, 2025 11:19

WIP: implementing case range analyzer

0acdd51

Signed-off-by: Yuanchun Shen <[email protected]>

Correct case analyzer

6afdcb6

Signed-off-by: Yuanchun Shen <[email protected]>

Create bucket aggregation parsers that supports parsing nested sub ag…

f416dee

…gregations Signed-off-by: Yuanchun Shen <[email protected]>

Fix unit tests

cbcb25a

Signed-off-by: Yuanchun Shen <[email protected]>

Merge remote-tracking branch 'origin/main' into issues/4201

7ec0684

Signed-off-by: Yuanchun Shen <[email protected]>

Fix parsers to multi-range cases

51813f0

Signed-off-by: Yuanchun Shen <[email protected]>

Update leaf bucket parser

373e825

Signed-off-by: Yuanchun Shen <[email protected]>

Unit test case range analyzer

f3830a5

Signed-off-by: Yuanchun Shen <[email protected]>

yuancu added the enhancement New feature or request label Sep 28, 2025

yuancu added 10 commits September 28, 2025 19:02

Add explain ITs for pushing down case in aggregations

da5a9c5

Signed-off-by: Yuanchun Shen <[email protected]>

Update CaseRangeAnalyzerTest

b46dd95

Signed-off-by: Yuanchun Shen <[email protected]>

Merge remote-tracking branch 'origin/main' into issues/4201

0660994

Signed-off-by: Yuanchun Shen <[email protected]>

Add a yaml test that replicates issue 4201

0493629

Signed-off-by: Yuanchun Shen <[email protected]>

Add integration tests for case in aggregation

f690add

Signed-off-by: Yuanchun Shen <[email protected]>

Fix unit tests

b204b2e

Signed-off-by: Yuanchun Shen <[email protected]>

Add a patch to CalcitePPLCaseFunctionIT

4dc86db

Signed-off-by: Yuanchun Shen <[email protected]>

Migrate all composite aggregation parser usage to bucket aggregate pa…

d38a916

…rser Signed-off-by: Yuanchun Shen <[email protected]>

Create a parent abstract classes for BucketAggregationParsers

6beed21

Signed-off-by: Yuanchun Shen <[email protected]>

Remove an unnecessary bucket agg in AggregationQueryBuilder

d40c244

Signed-off-by: Yuanchun Shen <[email protected]>

yuancu marked this pull request as ready for review September 29, 2025 03:30

yuancu requested review from Swiddis, YANG-DB, dai-chen, derek-ho, joshuali925, kavithacm, mengweieric, penghuo, ps48 and vamsimanohar as code owners September 29, 2025 03:30

yuancu requested review from GumpacG, LantaoJin, RyanL1997, acarbonetto, noCharger, qianheng-aws and ykmr1224 as code owners September 29, 2025 03:30

Test pushing down case where there exists null values

606e346

Signed-off-by: Yuanchun Shen <[email protected]>

yuancu force-pushed the issues/4201 branch from 35efd79 to 775995e Compare September 29, 2025 06:45

Return empty in CaseRangeAnalyzer to unblock the rest pushdown

a5fdd66

- Additionally test number as result expressions Signed-off-by: Yuanchun Shen <[email protected]>

yuancu force-pushed the issues/4201 branch from 775995e to a5fdd66 Compare September 29, 2025 07:23

yuancu added 5 commits September 29, 2025 17:12

Document limitations of pushding case as range queries

fdb9886

Signed-off-by: Yuanchun Shen <[email protected]>

Merge remote-tracking branch 'origin/main' into issues/4201

e4c5266

Signed-off-by: Yuanchun Shen <[email protected]>

Merge remote-tracking branch 'origin/main' into issues/4201

7a8db58

Signed-off-by: Yuanchun Shen <[email protected]>

Make case pushdown a private method

0ca81aa

Signed-off-by: Yuanchun Shen <[email protected]>

Chores: remove unused helper method

e701e57

Signed-off-by: Yuanchun Shen <[email protected]>

qianheng-aws reviewed Oct 13, 2025

View reviewed changes

qianheng-aws previously approved these changes Oct 13, 2025

View reviewed changes

Merge remote-tracking branch 'origin/main' into issues/4201

4968d1c

Signed-off-by: Yuanchun Shen <[email protected]>

LantaoJin reviewed Oct 13, 2025

View reviewed changes

Unify logics for creating nested aggregations

a22ae79

Signed-off-by: Yuanchun Shen <[email protected]>

yuancu dismissed qianheng-aws’s stale review via a22ae79 October 14, 2025 05:40

Remove a note in condition.rst

010fd06

Signed-off-by: Yuanchun Shen <[email protected]>

qianheng-aws reviewed Oct 15, 2025

View reviewed changes

yuancu added 3 commits October 15, 2025 16:56

Merge remote-tracking branch 'origin/main' into issues/4201

7d82cd8

Signed-off-by: Yuanchun Shen <[email protected]>

Optmize range aggregation

3b65b2d

Signed-off-by: Yuanchun Shen <[email protected]>

Ignore testNestedAggregationsExplain when pushdown is disabled

91aaee8

Signed-off-by: Yuanchun Shen <[email protected]>

yuancu force-pushed the issues/4201 branch from c0ed52c to 91aaee8 Compare October 16, 2025 07:00

Merge remote-tracking branch 'origin/main' into issues/4201

b8bb898

Signed-off-by: Yuanchun Shen <[email protected]>

yuancu force-pushed the issues/4201 branch from d4cc259 to b8bb898 Compare October 17, 2025 03:34

	if (aggregationBuilder.getLeft().size() == 1
	&& aggregationBuilder.getLeft().getFirst()
	instanceof AutoDateHistogramAggregationBuilder autoDateHistogram) {
	// If it's auto_date_histogram, filter the empty bucket by using the first aggregate metrics
	RexBuilder rexBuilder = getCluster().getRexBuilder();
	Optional<AggregationBuilder> aggBuilderOpt =
	autoDateHistogram.getSubAggregations().stream().toList().stream().findFirst();
	RexNode condition =
	aggBuilderOpt.isEmpty() \|\| aggBuilderOpt.get() instanceof ValueCountAggregationBuilder
	? rexBuilder.makeCall(
	SqlStdOperatorTable.GREATER_THAN,
	rexBuilder.makeInputRef(newScan, 1),
	rexBuilder.makeLiteral(
	0, rexBuilder.getTypeFactory().createSqlType(SqlTypeName.INTEGER)))
	: rexBuilder.makeCall(
	SqlStdOperatorTable.IS_NOT_NULL, rexBuilder.makeInputRef(newScan, 1));
	return LogicalFilter.create(newScan, condition);
	}

Pushdown case function in aggregations as range queries #4400

Are you sure you want to change the base?

Pushdown case function in aggregations as range queries #4400

Uh oh!

Conversation

yuancu commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Check List

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuancu Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yuancu commented Sep 28, 2025 •

edited

Loading

yuancu Oct 13, 2025 •

edited

Loading