Support project expression pushdown with derived field script #4288

songkant-aws · 2025-09-12T10:40:04Z

Description

Support project expression pushdown with derived field script.

This is the first phase of script project pushdown with partial script project supported. Follow-up like supporting partial filter pushdown after script project pushdown will be implemented later.

DerivedFieldScript Pros and Cons:

Pros:

DerivedFieldScript is evaluated at Search phase, which means it allows some non script filtering
Part of aggregations over derived field are supported. See: https://docs.opensearch.org/latest/field-types/supported-field-types/derived/#aggregations

Cons:

Limited type is supported. See: https://docs.opensearch.org/latest/field-types/supported-field-types/derived/#emitting-values-in-scripts. It needs one more layer to process correct data type in Calcite for not supported data types.
Scoring and sorting is not supported

Script field Pros and Cons:

Pros:

The output is Object, it allows Calcite flexibly process the right data type.

Cons:

Script_fields is evaluated post SearchPhase. It doesn't support filtering.
Sorting is not supported as well
Can't be involved in aggregation

Benchmark Results After Optimization

CalcitePPLBig5IT:
Summary:
asc_sort_timestamp: 8 ms
asc_sort_timestamp_can_match_shortcut: 13 ms
asc_sort_timestamp_no_can_match_shortcut: 12 ms
asc_sort_with_after_timestamp: 9 ms
bin_bins: 7 ms
bin_span_log: 8 ms
bin_span_time: 16 ms
composite_date_histogram_daily: 23 ms
composite_terms: 52 ms
composite_terms_keyword: 27 ms
date_histogram_hourly_agg: 13 ms
date_histogram_minute_agg: 22 ms
default: 9 ms
desc_sort_timestamp: 10 ms
desc_sort_timestamp_can_match_shortcut: 16 ms
desc_sort_timestamp_no_can_match_shortcut: 24 ms
desc_sort_with_after_timestamp: 9 ms
keyword_in_range: 23 ms
keyword_terms: 17 ms
keyword_terms_low_cardinality: 13 ms
multi_terms_keyword: 25 ms
query_string_on_message: 14 ms
query_string_on_message_filtered: 34 ms
query_string_on_message_filtered_sorted_num: 39 ms
range: 12 ms
range_auto_date_histo: 37 ms
range_auto_date_histo_with_metrics: 72 ms
range_field_conjunction_big_range_big_term_query: 10 ms
range_field_conjunction_small_range_big_term_query: 8 ms
range_field_conjunction_small_range_small_term_query: 15 ms
range_field_disjunction_big_range_small_term_query: 10 ms
range_numeric: 11 ms
range_with_asc_sort: 17 ms
range_with_desc_sort: 15 ms
scroll: 7 ms
sort_keyword_can_match_shortcut: 14 ms
sort_keyword_no_can_match_shortcut: 14 ms
sort_numeric_asc: 14 ms
sort_numeric_asc_with_match: 17 ms
sort_numeric_desc: 22 ms
sort_numeric_desc_with_match: 15 ms
term: 17 ms
terms_significant_1: 19 ms
terms_significant_2: 16 ms
Total 44 queries succeed. Average duration: 18 ms

CalcitePPLClickBenchIT:
Summary:
q1: 21 ms
q10: 59 ms
q11: 26 ms
q12: 31 ms
q13: 15 ms
q14: 21 ms
q15: 18 ms
q16: 13 ms
q17: 14 ms
q18: 10 ms
q19: 19 ms
q2: 18 ms
q20: 8 ms
q21: 11 ms
q22: 19 ms
q23: 24 ms
q24: 17 ms
q25: 15 ms
q26: 13 ms
q27: 15 ms
q28: 32 ms
q3: 21 ms
q31: 29 ms
q32: 31 ms
q33: 24 ms
q34: 12 ms
q35: 15 ms
q36: 18 ms
q37: 24 ms
q38: 22 ms
q39: 23 ms
q4: 16 ms
q40: 26 ms
q41: 27 ms
q42: 23 ms
q43: 31 ms
q5: 12 ms
q6: 12 ms
q7: 17 ms
q8: 21 ms
q9: 23 ms
Total 41 queries succeed. Average duration: 20 ms

Related Issues

Resolves #3387

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Lantao Jin <[email protected]>

Signed-off-by: Songkan Tang <[email protected]>

songkant-aws · 2025-09-12T10:42:50Z

...ava/org/opensearch/sql/opensearch/planner/physical/OpenSearchScriptProjectIndexScanRule.java

+                                                OpenSearchIndexScanRule::isScriptProjectPushed)
+                                            .and(OpenSearchIndexScanRule::isProjectPushed)
+                                            .and(OpenSearchIndexScanRule::noLimitPushed)


Ideally, we don't need such complex condition check. Script project pushdown can be merged to project pushdown method. Will optimize it later.

Merged two kinds of project into one method. This reduces the dependency on each other and reduces the times of rewriting plan. Additionally, if we introduce more rules related to project pushdown, it could be easier to modify current logic.

Signed-off-by: Songkan Tang <[email protected]>

qianheng-aws · 2025-10-13T06:44:25Z

#4245 has been merged and opensearch-project/OpenSearch#19271 has also been addressed by core.

Any other blocker or concern for this PR? @songkant-aws @LantaoJin @yuancu

Signed-off-by: Songkan Tang <[email protected]>

qianheng-aws · 2025-10-13T08:57:27Z

...earch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java

        }
          // Ignored Project in cost accumulation, but it will affect the external cost
-        case PROJECT -> {}
+        case PROJECT, SCRIPT_PROJECT -> {}


Shall we count the cost of SCIRPT_PROJECT since it should bring more overhead on cluster than PROJECT? Otherwise it will be too unfair to non-push-down on cost computing.

Added new cost calculation for SCRIPT_PROJECT

Signed-off-by: Songkan Tang <[email protected]>

qianheng-aws · 2025-10-14T03:15:48Z

...main/java/org/opensearch/sql/opensearch/planner/physical/OpenSearchProjectIndexScanRule.java

+    for (int i = 0; i < projExprs.size(); i++) {
+      final RexNode projExpr = projExprs.get(i);
+      if (isPushableNewDerived(projExpr, derivedIndexSet, scan)) {
+        final String uniquifiedAlias =


With this method, ... | eval a = a + 1 will produce a new derived field a1, does it?

Yes, it will produce a0.

Interestingly, I find although it may physically generate a EnumerableScan with rowType like [age0 BIGINT], the EnumerableScan's schema is still [age BIGINT]. I think this logicRexUtil.isIdentity(newExprs, newScan.getRowType()) decides as long as the enumerator result is correct(RexInput is the same), it doesn't care what's the actual scan's rowType.

Added an IT called testFieldsWithNameConflictDerivedFieldPushdown to ensure query correctness.

qianheng-aws · 2025-10-14T03:23:54Z

...main/java/org/opensearch/sql/opensearch/planner/physical/OpenSearchProjectIndexScanRule.java

-      if (isSequential && !Objects.equals(integer, current++)) {
+    public boolean add(SelectedColumn item) {
+      if (isSequential
+          && item.getKind() == Kind.PHYSICAL


Should Kind.DERIVED_EXISTING be included here as well?

qianheng-aws · 2025-10-14T04:49:31Z

...main/java/org/opensearch/sql/opensearch/planner/physical/OpenSearchProjectIndexScanRule.java

+                if (!seenOldIndex.get(oldIdx)) {
+                  seenOldIndex.set(oldIdx);
+                  if (derivedIndexSet.get(oldIdx)) {
+                    selected.add(SelectedColumn.derivedExisting(oldIdx));


What's the function of distinguishing DERIVED_EXISTING from PHYSICAL

I'm thinking if the reason why we need DERIVED_EXISTING is for the case of SCAN-PROJECT-PROJECT. Or shall we only handle the case of SCAN-PROJECT which should be produced by project merge rule while prevent project push down if there is already SCRIPT_PROJECT pushed in scan.

As discussed offline, sometimes the plan can't always merge projects by ProjectMergeRule. For example, project(a) - sort a + b - project(a, a + b) - scan. a + b expression is a kind of complex expression that requires a immediate followup sort. In this case, it would be more straightforward to allow multiple project pushdown, although it seems inner logic is more complex.

Also, allowing multiple project pushdown brings more flexibility. If we don't see this requirement in future, we can disable it.

qianheng-aws · 2025-10-14T05:16:17Z

...main/java/org/opensearch/sql/opensearch/planner/physical/OpenSearchProjectIndexScanRule.java

+        final int pos = projIdxToNewPos.get(i);
+        newExprs.add(call.builder().getRexBuilder().makeInputRef(projExprs.get(i).getType(), pos));
+      } else {
+        newExprs.add(RexUtil.apply(oldIdxToNewPos, projExprs.get(i)));


RexUtil.apply will create a new shuttles when calling. It seems expensive to create that every time for each projExpr although the shuttles should be the same one.

https://github.com/opensearch-project/sql/pull/3951/files#diff-5ffab7c85f9c37e1ce56e8742848701dbe1baa77149aed8868189c01f51c8436

How about creating a new extended RexPermuteInputsShuttle and AbstractMapping by ourselves? That shuttle should be able to handle all kinds of SelectedItems. Then the mapping construction process and expression transformation process could be simplified. I used to do a similar work in above draft PR.

qianheng-aws · 2025-10-14T06:18:08Z

.../main/java/org/opensearch/sql/opensearch/planner/physical/OpenSearchFilterIndexScanRule.java

+                                            // Script filter of derived field input is not supported
+                                            .and(
+                                                Predicate.not(
+                                                    OpenSearchIndexScanRule::isScriptProjectPushed))


What will happen if push down agg/sort on derived field? Could you please add a test for that case?

I banned the agg pushdown and complex sort expression pushdown on derived field. Agg pushdown will match agg - project - scan and optimize it with our own rule. Add a test case for testScriptSort. Existing agg test should already take care of agg pushdown.

Signed-off-by: Songkan Tang <[email protected]>

songkant-aws · 2025-10-20T02:18:17Z

Could you review this PR with another look? @qianheng-aws @LantaoJin @yuancu

yuancu

LGTM

yuancu · 2025-10-21T03:16:34Z

integ-test/src/test/java/org/opensearch/sql/ppl/ExplainIT.java

+            "source=opensearch-sql_test_index_account"
+                + "| eval age = age + 2"
+                + "| fields age, lastname"));
+  }


It seems in the plan the new age field becomes age0. I'm curious where is it set back to name age

Seems the column names in our final results are derived from the original plan(i.e. logical plan), so the final plan(i.e. physical plan) is allowed to produce a different row type as long as the types can match.

qianheng-aws · 2025-10-21T08:53:43Z

@LantaoJin Please take another look at this PR.

LantaoJin and others added 13 commits July 25, 2025 19:00

Support script project pushdown

e1609c7

Signed-off-by: Lantao Jin <[email protected]>

Merge remote-tracking branch 'upstream/main' into issues/3387

46e27ff

Fix IT

e563b5b

Signed-off-by: Lantao Jin <[email protected]>

Merge branch 'main' into pr/issues/3387

efef12c

Resolve compile issue and use derived field script

f197c4d

Signed-off-by: Songkan Tang <[email protected]>

Merge branch 'main' into project-pushdown

c693625

Fix derived field pushdown with correct project names and setup

d6061e9

Signed-off-by: Songkan Tang <[email protected]>

Fix pushed derived field name key conflicts with index fields issue

65973d8

Signed-off-by: Songkan Tang <[email protected]>

Merge branch 'main' into project-pushdown

bb9cb2c

Exclude some cases that are not supported

688b5c9

Signed-off-by: Songkan Tang <[email protected]>

Merge branch 'main' into project-pushdown

2794e1c

Exclude agg pushdown case after script project pushdown

34aa546

Signed-off-by: Songkan Tang <[email protected]>

Refactor the code a bit

5f1b61d

Signed-off-by: Songkan Tang <[email protected]>

songkant-aws commented Sep 12, 2025

View reviewed changes

songkant-aws changed the title ~~[Feature] Support project expression pushdown with derived field script~~ Support project expression pushdown with derived field script Sep 12, 2025

songkant-aws marked this pull request as ready for review September 12, 2025 10:45

songkant-aws requested review from MaxKsyunz, Swiddis, YANG-DB, Yury-Fridlyand, anirudha, dai-chen, derek-ho, joshuali925, kavithacm, mengweieric, penghuo, ps48, seankao-az and vamsimanohar as code owners September 12, 2025 10:45

songkant-aws force-pushed the project-pushdown branch from 2f2e0cc to debae75 Compare September 23, 2025 09:52

Minor fixes to save script length

c7e154b

Signed-off-by: Songkan Tang <[email protected]>

songkant-aws force-pushed the project-pushdown branch from debae75 to c7e154b Compare September 23, 2025 09:56

Fix no pushdown IT

5e9ec59

Signed-off-by: Songkan Tang <[email protected]>

songkant-aws force-pushed the project-pushdown branch from aceee3e to 5e9ec59 Compare September 23, 2025 10:19

songkant-aws added 4 commits September 24, 2025 11:09

Merge branch 'main' into project-pushdown

ec6ca9f

Enable valid UDT type in script project pushdown

b4815b4

Signed-off-by: Songkan Tang <[email protected]>

Merge branch 'main' into project-pushdown

29882e9

Merge branch 'main' into project-pushdown

01e38a2

songkant-aws added 2 commits October 13, 2025 14:52

Correct explained plans after merge

9bc1fc3

Signed-off-by: Songkan Tang <[email protected]>

Fix spotless check

62827f4

Signed-off-by: Songkan Tang <[email protected]>

qianheng-aws reviewed Oct 13, 2025

View reviewed changes

Support float type project pushdown

68cd3c7

Signed-off-by: Songkan Tang <[email protected]>

qianheng-aws reviewed Oct 14, 2025

View reviewed changes

songkant-aws added 10 commits October 15, 2025 09:57

Fix expected result of testDivide

8210764

Signed-off-by: Songkan Tang <[email protected]>

Merge branch 'main' into project-pushdown

037476a

Address comments and relax pushdown row count factor

1da8670

Signed-off-by: Songkan Tang <[email protected]>

Fix explain cost test

3f046f3

Signed-off-by: Songkan Tang <[email protected]>

Merge branch 'main' into project-pushdown

3e501b2

Add more test cases

72d4481

Signed-off-by: Songkan Tang <[email protected]>

Transform test json files to yaml files

e36d38b

Signed-off-by: Songkan Tang <[email protected]>

Minor change to instantiate single instance of remap index visitor

70afed0

Signed-off-by: Songkan Tang <[email protected]>

Add an IT to check query correctness

bc91275

Signed-off-by: Songkan Tang <[email protected]>

Fix spotless check

25e61d8

Signed-off-by: Songkan Tang <[email protected]>

yuancu approved these changes Oct 21, 2025

View reviewed changes

qianheng-aws approved these changes Oct 21, 2025

View reviewed changes

Support project expression pushdown with derived field script #4288

Are you sure you want to change the base?

Support project expression pushdown with derived field script #4288

Uh oh!

Conversation

songkant-aws commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Benchmark Results After Optimization

Related Issues

Check List

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qianheng-aws commented Oct 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

songkant-aws Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

songkant-aws Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

songkant-aws commented Oct 20, 2025

Uh oh!

yuancu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qianheng-aws commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

songkant-aws commented Sep 12, 2025 •

edited

Loading

songkant-aws Oct 17, 2025 •

edited

Loading

songkant-aws Oct 17, 2025 •

edited

Loading