Skip to content

Conversation

@kevintang2022
Copy link
Owner

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* ... 
* ... 

Hive Connector Changes
* ... 
* ... 

If release note is NOT required, use:

== NO RELEASE NOTE ==

@pdabre12 pdabre12 force-pushed the native-inlined-sql-functions-module branch 6 times, most recently from 94561b4 to 627d4f9 Compare September 3, 2025 22:36
shrinidhijoshi and others added 4 commits September 4, 2025 14:15
…esto-spark code in presto-native-execution module
…tercept functions (prestodb#25475) (prestodb#25748)

Summary:

## Context
Currently we don't enforce intermediate/return type are the same in
Coordinator and Prestissimo Worker.
Velox creates vectors for intermediate/return results based on a plan
that comes from Coordinator. Then Prestissimo tries to use those vector
and not crash.

In practise we had a crash some time ago due to such a mismatch
(D74199165).
And I added validation to Velox to catch such kind of mismatches early:
facebookincubator/velox#13322
But we wasn't able to enable it in prod, because the validation failed
for "regr_slope" and "regr_intercept" functions.

## What's changed?
In this diff I'm fixing "regr_slope" and "regr_intercept" intermediate
types. Basically in Java `AggregationState` for all these functions is
the same:
```
    AggregationFunction("regr_slope")
    AggregationFunction("regr_intercept")
    AggregationFunction("regr_sxy")
    AggregationFunction("regr_sxx")
    AggregationFunction("regr_syy")
    AggregationFunction("regr_r2")
    AggregationFunction("regr_count")
    AggregationFunction("regr_avgy")
    AggregationFunction("regr_avgx")
```
But in Prestissimo the state storage is more optimal:
```
    AggregationFunction("regr_slope")
    AggregationFunction("regr_intercept")
```
These 2 aggregation functions don't have M2Y field. And this is more
efficient, because we don't waste memory and CPU on the field, that
aren't needed.

So I moved M2Y to extended class, the same as it works in Velox:
https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/aggregates/CovarianceAggregates.cpp?fbclid=IwY2xjawLRTetleHRuA2FlbQIxMQBicmlkETFiT0N3UFR0M2VKOHl6MHRhAR6KRQ1VUQdCkZXzwj14sMQrVZ-R9QBH1utuGJb5U_lyGzDwt8PwV317QRVNJg_aem_-ePxZ-fHO5MNgfUmayVJFA#L326-L337

No major changes, mostly just reorganized the code.

## Test plan
I tested `REGR_SLOPE`, `REGR_INTERCEPT` and `REGR_R2` functions since
they are heavily used in prod and cover both cases: with and without M2Y
field.

What my test looked like. For all 3 `REGR_*` functions I found some prod
queries, then:
1. Ran them on prev Java build
2. Ran them on new (with this PR) Java build
3. Ran them on prev Prestissimo build
4. Ran them on new (with this PR) Prestissimo build

And compared the output results. They all were identical.
With this manual test we covered `Coordinator -> Java Worker` and
`Coordinator -> Prestissimo Worker` integrations.

## Next steps
In this diff I'm trying to apply the same optimization to Java. With
this fix, the signatures will become the same in Java and Prestissimo
and we will be able to enable the validation

Differential Revision: D77625566

== NO RELEASE NOTES ==
@pdabre12 pdabre12 force-pushed the native-inlined-sql-functions-module branch from 627d4f9 to 5497c15 Compare September 5, 2025 17:48
@pdabre12 pdabre12 force-pushed the native-inlined-sql-functions-module branch from 5497c15 to 41e2705 Compare September 5, 2025 18:15
…ecar enabled clusters

Adds a new plugin : presto-native-sql-invoked-functions-plugin that contains all inlined SQL functions except those with overridden native
implementations. This plugin is intended to be loaded only in sidecar enabled clusters.
@pdabre12 pdabre12 force-pushed the native-inlined-sql-functions-module branch from 41e2705 to 0458101 Compare September 5, 2025 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants