Skip to content

Conversation

yuancu
Copy link
Collaborator

@yuancu yuancu commented Sep 28, 2025

Description

This PR push down CASE functions used in aggregations as range queries.

For example, the query source=bank | eval age_range = case (age < 30, 'u30', age < 40, 'u40' else 'u100') | stats avg(balance) by age_range will be pushed down as the following OpenSearch DSL:

{
  "aggregations": {
    "age_range": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "key": "u30",
            "to": 30
          },
          {
            "key": "u40",
            "from": 30,
            "to": 40
          },
          {
            "key": "u100",
            "from": 40
          }
        ],
        "keyed": true
      },
      "aggregations": {
        "avg(balance)": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

A CASE function used in aggregation can be pushed down only if it satisfied the following criteria:

  • Result expressions must be string literals
  • The field referenced in the condition must be the same
  • Field references must be numeric
  • Ranges must be closed-open intervals: $[a, b)$, $[a, +inf)$, or $(-inf, b)$.
  • No further operations can be performed on the result of the case function

Limitations:

  • It only handle cases where the result expression is a string literal. E.g. case(balance<10, 'poor' else 'rich') will be pushed down, while case(balance<10, 10 else 100) won't.
  • Red flag: range query will ignore null values. E.g. eval b = case(balance<10, 'poor' else 'rich') | stats avg(balance) by b will not properly handle cases when there are balance with null values. For case function, null values be categorized into the else group; while with pushed-down aggregation, rows with null balance will be ignored.
  • With case function, the default else group is null. However, since null can not be a key for a range query, we substitute it with "null". This can be fixed later by assigning a secret key to the else group, and substituting it later when parsing the response.

Examples of generated DSL

Case 1: Group by the case field only, with sub-aggregations
source=bank | eval age_range = case (age < 30, 'u30', age < 40, 'u40' else 'u100') | stats avg(balance) by age_range
RangeAgg
    MetricAgg
{
  "aggregations": {
    "age_range": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "key": "u30",
            "to": 30
          },
          {
            "key": "u40",
            "from": 30,
            "to": 40
          },
          {
            "key": "u100",
            "from": 40
          }
        ],
        "keyed": true
      },
      "aggregations": {
        "avg(balance)": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}
Case 2: Group by multiple ranges with sub-aggregations
source=bank | eval age_range = case (age < 30, 'u30', age < 35, 'u35', age < 40, 'u40', age >= 40, 'u100'), balance_range = case(balance < 20000, 'medium' else 'high') | stats avg(balance) by age_range, balance_range
RangeAgg
    RangeAgg
        MetricAgg
{
  "aggregations": {
    "age_range": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "key": "u30",
            "to": 30
          },
          {
            "key": "u35",
            "from": 30,
            "to": 35
          },
          {
            "key": "u40",
            "from": 35,
            "to": 40
          },
          {
            "key": "u100",
            "from": 40
          }
        ],
        "keyed": true
      },
      "aggregations": {
        "balance_range": {
          "range": {
            "field": "balance",
            "ranges": [
              {
                "key": "medium",
                "to": 20000
              },
              {
                "key": "high",
                "from": 20000
              }
            ],
            "keyed": true
          },
          "aggregations": {
            "avg(balance)": {
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      }
    }
  }
}
Case 3: Group by case field and keyword field
source=bank | eval age_range = case (age < 30, 'u30', age < 35, 'u35', age < 40, 'u40', age >= 40, 'u100') | stats avg(balance), count() by firstname, lastname, age_range
CompositeAgg
    RangeAgg
        MetricAgg
{
  "aggregations": {
    "composite_buckets": {
      "composite": {
        "size": 1000,
        "sources": [
          {
            "firstname": {
              "terms": {
                "field": "firstname",
                "missing_bucket": true,
                "missing_order": "first",
                "order": "asc"
              }
            }
          },
          {
            "lastname": {
              "terms": {
                "field": "lastname",
                "missing_bucket": true,
                "missing_order": "first",
                "order": "asc"
              }
            }
          }
        ]
      },
      "aggregations": {
        "age_range": {
          "range": {
            "field": "age",
            "ranges": [
              {
                "key": "u30",
                "to": 30
              },
              {
                "key": "u35",
                "from": 30,
                "to": 35
              },
              {
                "key": "u40",
                "from": 35,
                "to": 40
              },
              {
                "key": "u100",
                "from": 40
              }
            ],
            "keyed": true
          },
          "aggregations": {
            "avg(balance)": {
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      }
    }
  }
}

TODOs:

  • Fix the discrepancy of null as expression results (in the pushed down version, it is "null" instead of null) left as a limitation.
  • Test cases where the result expressions are numbers
  • Test cases where there contains null values
  • Javadoc for newly added interfaces
  • Refactor bucket parsers, etc once Fallback to sub-aggregation if composite aggregation doesn't support #4413 is merged
  • Unify how sub-aggregations are created across AutoDataHistogramAggregation, RangeAggregation and CompositeAggregation

Related Issues

Resolves #4201 , partially resolves #4338

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@yuancu yuancu added the enhancement New feature or request label Sep 28, 2025
- Additionally test number as result expressions

Signed-off-by: Yuanchun Shen <[email protected]>
return Pair.of(
Collections.singletonList(aggregationBuilder),
new CompositeAggregationParser(metricParserList, countAggNames));
new BucketAggregationParser(metricParsers, countAggNames));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean we don't need CompositeAggregationParser anymore?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm thinking of retiring CompositeAggregationParser since a composite aggregation is essentially also a bucket aggregation. By keeping only CompositeAggregationParser, we can avoid duplicating codes to handle subaggregations,etc.

Collections.singletonList(aggregationBuilder),
new CompositeAggregationParser(metricParserList, countAggNames));
new BucketAggregationParser(metricParsers, countAggNames));
} catch (CompositeAggUnSupportedException e) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any logic for handling range bucket when this exception happened? This part of code will construct composite agg unsupported bucket like auto_date_histogram. I'm wondering what will happens if both having auto_span + range_bucket in our query.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test case like bin timestamp bins=xxx | eval age_range = case ... | stats count() by timestamp, age_range

Copy link
Collaborator Author

@yuancu yuancu Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't handled the case where auto date histogram and range bucket coexists. Currently, it will treat age_range as a sub-term aggregation with script.

explain results for `source=time_test | bin timestamp bins=3 | eval value_range = case(value < 7000, 'small' else 'great') | stats bucket_nullable=false avg(value) by timestamp, value_range`
{
  "calcite": {
    "logical": """LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT])
  LogicalProject(avg(value)=[$2], timestamp=[$0], value_range=[$1])
    LogicalAggregate(group=[{0, 1}], avg(value)=[AVG($2)])
      LogicalProject(timestamp=[$9], value_range=[$10], value=[$2])
        LogicalFilter(condition=[IS NOT NULL($9)])
          LogicalProject(@timestamp=[$0], category=[$1], value=[$2], _id=[$4], _index=[$5], _score=[$6], _maxscore=[$7], _sort=[$8], _routing=[$9], timestamp=[WIDTH_BUCKET($3, 3, -(MAX($3) OVER (), MIN($3) OVER ()), MAX($3) OVER ())], value_range=[CASE(<($2, 7000), 'small':VARCHAR, 'great':VARCHAR)])
            CalciteLogicalIndexScan(table=[[OpenSearch, time_test]])
""",
    "physical": """EnumerableLimit(fetch=[10000])
  CalciteEnumerableIndexScan(table=[[OpenSearch, time_test]], PushDownContext=[[AGGREGATION->rel#2751:LogicalAggregate.NONE.[](input=RelSubset#2693,group={1, 2},avg(value)=AVG($0)), PROJECT->[avg(value), timestamp, value_range]], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"timestamp":{"auto_date_histogram":{"field":"timestamp","buckets":3,"minimum_interval":null},"aggregations":{"value_range":{"terms":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXNyABFqYXZhLnV0aWwuQ29sbFNlcleOq7Y6G6gRAwABSQADdGFneHAAAAADdwQAAAAGdAAHcm93VHlwZXQERHsKICAiZmllbGRzIjogWwogICAgewogICAgICAidWR0IjogIkVYUFJfVElNRVNUQU1QIiwKICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICJudWxsYWJsZSI6IHRydWUsCiAgICAgICJwcmVjaXNpb24iOiAtMSwKICAgICAgIm5hbWUiOiAiQHRpbWVzdGFtcCIKICAgIH0sCiAgICB7CiAgICAgICJ0eXBlIjogIlZBUkNIQVIiLAogICAgICAibnVsbGFibGUiOiB0cnVlLAogICAgICAicHJlY2lzaW9uIjogLTEsCiAgICAgICJuYW1lIjogImNhdGVnb3J5IgogICAgfSwKICAgIHsKICAgICAgInR5cGUiOiAiSU5URUdFUiIsCiAgICAgICJudWxsYWJsZSI6IHRydWUsCiAgICAgICJuYW1lIjogInZhbHVlIgogICAgfSwKICAgIHsKICAgICAgInVkdCI6ICJFWFBSX1RJTUVTVEFNUCIsCiAgICAgICJ0eXBlIjogIlZBUkNIQVIiLAogICAgICAibnVsbGFibGUiOiB0cnVlLAogICAgICAicHJlY2lzaW9uIjogLTEsCiAgICAgICJuYW1lIjogInRpbWVzdGFtcCIKICAgIH0sCiAgICB7CiAgICAgICJ0eXBlIjogIlZBUkNIQVIiLAogICAgICAibnVsbGFibGUiOiB0cnVlLAogICAgICAicHJlY2lzaW9uIjogLTEsCiAgICAgICJuYW1lIjogIl9pZCIKICAgIH0sCiAgICB7CiAgICAgICJ0eXBlIjogIlZBUkNIQVIiLAogICAgICAibnVsbGFibGUiOiB0cnVlLAogICAgICAicHJlY2lzaW9uIjogLTEsCiAgICAgICJuYW1lIjogIl9pbmRleCIKICAgIH0sCiAgICB7CiAgICAgICJ0eXBlIjogIlJFQUwiLAogICAgICAibnVsbGFibGUiOiB0cnVlLAogICAgICAibmFtZSI6ICJfc2NvcmUiCiAgICB9LAogICAgewogICAgICAidHlwZSI6ICJSRUFMIiwKICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgIm5hbWUiOiAiX21heHNjb3JlIgogICAgfSwKICAgIHsKICAgICAgInR5cGUiOiAiQklHSU5UIiwKICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgIm5hbWUiOiAiX3NvcnQiCiAgICB9LAogICAgewogICAgICAidHlwZSI6ICJWQVJDSEFSIiwKICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgInByZWNpc2lvbiI6IC0xLAogICAgICAibmFtZSI6ICJfcm91dGluZyIKICAgIH0KICBdLAogICJudWxsYWJsZSI6IHRydWUKfXQABGV4cHJ0Atp7CiAgIm9wIjogewogICAgIm5hbWUiOiAiQ0FTRSIsCiAgICAia2luZCI6ICJDQVNFIiwKICAgICJzeW50YXgiOiAiU1BFQ0lBTCIKICB9LAogICJvcGVyYW5kcyI6IFsKICAgIHsKICAgICAgIm9wIjogewogICAgICAgICJuYW1lIjogIjwiLAogICAgICAgICJraW5kIjogIkxFU1NfVEhBTiIsCiAgICAgICAgInN5bnRheCI6ICJCSU5BUlkiCiAgICAgIH0sCiAgICAgICJvcGVyYW5kcyI6IFsKICAgICAgICB7CiAgICAgICAgICAiaW5wdXQiOiAyLAogICAgICAgICAgIm5hbWUiOiAiJDIiCiAgICAgICAgfSwKICAgICAgICB7CiAgICAgICAgICAibGl0ZXJhbCI6IDcwMDAsCiAgICAgICAgICAidHlwZSI6IHsKICAgICAgICAgICAgInR5cGUiOiAiSU5URUdFUiIsCiAgICAgICAgICAgICJudWxsYWJsZSI6IGZhbHNlCiAgICAgICAgICB9CiAgICAgICAgfQogICAgICBdCiAgICB9LAogICAgewogICAgICAibGl0ZXJhbCI6ICJzbWFsbCIsCiAgICAgICJ0eXBlIjogewogICAgICAgICJ0eXBlIjogIlZBUkNIQVIiLAogICAgICAgICJudWxsYWJsZSI6IGZhbHNlLAogICAgICAgICJwcmVjaXNpb24iOiAtMQogICAgICB9CiAgICB9LAogICAgewogICAgICAibGl0ZXJhbCI6ICJncmVhdCIsCiAgICAgICJ0eXBlIjogewogICAgICAgICJ0eXBlIjogIlZBUkNIQVIiLAogICAgICAgICJudWxsYWJsZSI6IGZhbHNlLAogICAgICAgICJwcmVjaXNpb24iOiAtMQogICAgICB9CiAgICB9CiAgXQp9dAAKZmllbGRUeXBlc3NyABdqYXZhLnV0aWwuTGlua2VkSGFzaE1hcDTATlwQbMD7AgABWgALYWNjZXNzT3JkZXJ4cgARamF2YS51dGlsLkhhc2hNYXAFB9rBwxZg0QMAAkYACmxvYWRGYWN0b3JJAAl0aHJlc2hvbGR4cD9AAAAAAAAMdwgAAAAQAAAABHQACkB0aW1lc3RhbXBzcgA6b3JnLm9wZW5zZWFyY2guc3FsLm9wZW5zZWFyY2guZGF0YS50eXBlLk9wZW5TZWFyY2hEYXRlVHlwZZ4tUq4QfcqvAgABTAAHZm9ybWF0c3QAEExqYXZhL3V0aWwvTGlzdDt4cgA6b3JnLm9wZW5zZWFyY2guc3FsLm9wZW5zZWFyY2guZGF0YS50eXBlLk9wZW5TZWFyY2hEYXRhVHlwZcJjvMoC+gU1AgADTAAMZXhwckNvcmVUeXBldAArTG9yZy9vcGVuc2VhcmNoL3NxbC9kYXRhL3R5cGUvRXhwckNvcmVUeXBlO0wAC21hcHBpbmdUeXBldABITG9yZy9vcGVuc2VhcmNoL3NxbC9vcGVuc2VhcmNoL2RhdGEvdHlwZS9PcGVuU2VhcmNoRGF0YVR5cGUkTWFwcGluZ1R5cGU7TAAKcHJvcGVydGllc3QAD0xqYXZhL3V0aWwvTWFwO3hwfnIAKW9yZy5vcGVuc2VhcmNoLnNxbC5kYXRhLnR5cGUuRXhwckNvcmVUeXBlAAAAAAAAAAASAAB4cgAOamF2YS5sYW5nLkVudW0AAAAAAAAAABIAAHhwdAAJVElNRVNUQU1QfnIARm9yZy5vcGVuc2VhcmNoLnNxbC5vcGVuc2VhcmNoLmRhdGEudHlwZS5PcGVuU2VhcmNoRGF0YVR5cGUkTWFwcGluZ1R5cGUAAAAAAAAAABIAAHhxAH4AE3QABERhdGVzcgA8c2hhZGVkLmNvbS5nb29nbGUuY29tbW9uLmNvbGxlY3QuSW1tdXRhYmxlTWFwJFNlcmlhbGl6ZWRGb3JtAAAAAAAAAAACAAJMAARrZXlzdAASTGphdmEvbGFuZy9PYmplY3Q7TAAGdmFsdWVzcQB+ABp4cHVyABNbTGphdmEubGFuZy5PYmplY3Q7kM5YnxBzKWwCAAB4cAAAAAB1cQB+ABwAAAAAc3IAE2phdmEudXRpbC5BcnJheUxpc3R4gdIdmcdhnQMAAUkABHNpemV4cAAAAAF3BAAAAAF0ABdkYXRlX2hvdXJfbWludXRlX3NlY29uZHh0AAhjYXRlZ29yeX5xAH4AEnQABlNUUklOR3QABXZhbHVlfnEAfgASdAAHSU5URUdFUnQACXRpbWVzdGFtcHNxAH4AC3EAfgAUcQB+ABdxAH4AG3NxAH4AHwAAAAF3BAAAAAF0ABdkYXRlX2hvdXJfbWludXRlX3NlY29uZHh4AHg=\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp":1760342401776012000}},"size":1000,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":{"_key":"asc"}},"aggregations":{"avg(value)":{"avg":{"field":"value"}}}}}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])
"""
  }
}

I'll optimize this case and move the snippets of creating range buckets to createNestedBuckets

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed & test case added

List<String> countAggNames = countAggNameAndBuilderPair.getLeft();

if (aggregate.getGroupSet().isEmpty()) {
Pair<Set<Integer>, AggregationBuilder> caseAggPushedAndRangeBuilder =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we handle range bucket separately? Can it be handled in createNestedBuckets like how we construct auto_date_span?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems composite agg is more proper than constructing multiple sub-agg by createNestedBuckets. It looks good to me to keep the current implementation. #4400 (comment).

We should refactor auto_date_span to have similar implementation but also have to use createNestedBuckets if both have range_bucket and auto_date_span.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

} else {
new BucketAggregationParser(metricParsers, countAggNames));
}
// It has both composite aggregation and range aggregation:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR but #4413. It seems we have chance to optimize sub-agg by combining part of buckets into composite bucket if they're supported by composite agg.

e.g. transform termBucket-termBucket-autoDateSpanBucket to compositeBucket - autoDateSpanBucket.

We can also do bucket reorder to scale this optimization to more cases

e.g. transform termBucket-autoDateSpanBucket-termBucket to compositeBucket - autoDateSpanBucket.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with the latest implementation

qianheng-aws
qianheng-aws previously approved these changes Oct 13, 2025
Comment on lines +230 to +233
Limitations
>>>>>>>>>>>

When each condition is a field comparison with a numeric literal and each result expression is a string literal, the query will be optimized as `range aggregations <https://docs.opensearch.org/latest/aggregations/bucket/range>`_ if pushdown optimization is enabled. However, this optimization has the following limitations:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it's not a limitation of case function, it is just a restricted optimization. We can just call out in what case, the case function would be optimized to range DSL. Can we add some optimizable case usages in user doc?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the stated conditions are in the scope of restricted optimizations, but the limitations are not because we will still do the optimization regardless of whether it has null values in its column or whether there is a default NULL range.

The problem is that there is no way to know in advance whether there exists null values in a column. Therefore, if we do this optimization, we always risk the discrepancy in results of with & without push-down.

- Null values will not be grouped into any bucket of a range aggregation and will be ignored
- The default ELSE clause will use the string literal ``"null"`` instead of actual NULL values

To avoid these edge-case limitations, set ``plugins.calcite.pushdown.enabled`` to false.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Signed-off-by: Yuanchun Shen <[email protected]>
Comment on lines 219 to 228
// Push auto date span & case in group-by list into nested aggregations
Pair<Set<Integer>, AggregationBuilder> aggPushedAndAggBuilder =
createNestedAggregation(groupList, project, subBuilder, helper);
Set<Integer> aggPushed = aggPushedAndAggBuilder.getLeft();
AggregationBuilder pushedAggBuilder = aggPushedAndAggBuilder.getRight();
// The group-by list after removing pushed aggregations
groupList = groupList.stream().filter(i -> !aggPushed.contains(i)).toList();
if (pushedAggBuilder != null) {
subBuilder = new Builder().addAggregator(pushedAggBuilder);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[non-blocking] Should this part of code be put in the second branch of the below if for better readability? It should have structure like:

if (aggregate.getGroupSet().isEmpty()) {
  // no group by
} else {
    // Push auto date span & case in group-by list into nested aggregations
   ...
   ...
   if (groupList.isEmpty()) {
        // No composite aggregation at top-level
        ...
   } else {
        // Composite aggregation at top level
        ...
    }
}

It works well with current code but performing useless operations on some empty collections for no-group-by.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! optimized the logic here

}
} else if (bucket instanceof Range.Bucket) {
// return null so that an empty range will be filtered out
if (bucket.getDocCount() == 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this at the beginning of this method to skip meaningless operations in advance?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include InternalAutoDateHistogram.Bucket here as well to skip empty bucket for auto_date_span.

And also remove the code

if (aggregationBuilder.getLeft().size() == 1
&& aggregationBuilder.getLeft().getFirst()
instanceof AutoDateHistogramAggregationBuilder autoDateHistogram) {
// If it's auto_date_histogram, filter the empty bucket by using the first aggregate metrics
RexBuilder rexBuilder = getCluster().getRexBuilder();
Optional<AggregationBuilder> aggBuilderOpt =
autoDateHistogram.getSubAggregations().stream().toList().stream().findFirst();
RexNode condition =
aggBuilderOpt.isEmpty() || aggBuilderOpt.get() instanceof ValueCountAggregationBuilder
? rexBuilder.makeCall(
SqlStdOperatorTable.GREATER_THAN,
rexBuilder.makeInputRef(newScan, 1),
rexBuilder.makeLiteral(
0, rexBuilder.getTypeFactory().createSqlType(SqlTypeName.INTEGER)))
: rexBuilder.makeCall(
SqlStdOperatorTable.IS_NOT_NULL, rexBuilder.makeInputRef(newScan, 1));
return LogicalFilter.create(newScan, condition);
}
here. Previously we add another filter to filter out the empty bucket.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored as suggested

return null;
}
// the content of the range bucket is extracted with `r.put(name, bucket.getKey())` below
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be another else here? Otherwise it will put the bucket.getKey into the results for composite agg as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I used to put all contents in bucket.getKey to pass unit tests in a wrong way; corrected the behavior now.

rows(39225.0, 1, "a30", "IL", "M"),
rows(48086.0, 1, "a30", "IN", "F"));

// 2.4 Composite (2 fields) - Range - Range - Metric (with count)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test case for composite - auto_date_span - range - metric. Set bins to a big enough value like 100 to verify whether the empty buckets are filtered out properly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test added

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Support case command pushdown to range agg [FEATURE] Support nested sub-aggregation parser

3 participants