-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Add support for TopN pushdown for Oracle #26688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add support for TopN pushdown for Oracle #26688
Conversation
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-oracle/src/main/java/io/trino/plugin/oracle/OracleClient.java
Outdated
Show resolved
Hide resolved
38c39d6
to
ee583ef
Compare
plugin/trino-oracle/src/main/java/io/trino/plugin/oracle/OracleClient.java
Outdated
Show resolved
Hide resolved
according a google search, Oracle 11 is EOL and we generally do not support old versions of 3rd party products that are already sunset. it should be fine to move on, especially when it simplifies connector code as it is in this case |
Should the upgrade be done as part of this task? TestingOracleServer and TestJdbcVendorCompatibility use gvenzl/oracle-xe:11.2.0.2-full. |
82b935e
to
ffb0956
Compare
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
61d42c2
to
ed60393
Compare
plugin/trino-oracle/src/main/java/io/trino/plugin/oracle/OracleClient.java
Show resolved
Hide resolved
plugin/trino-oracle/src/main/java/io/trino/plugin/oracle/OracleClient.java
Outdated
Show resolved
Hide resolved
4c33a89
to
1149ac8
Compare
// In Oracle BINARY collation provides byte-based comparison | ||
collation = "COLLATE BINARY"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Treating the bytes as signed or unsigned?
It's worth a code comment explaining that Trino string codepoint-based collation is equivalent to sorting (unsigned) bytes ... in UTF-8 encoding.
When we sent COLLATE BINARY
to oracle, are we assuming the data is compared byte-wise on Utf-8 encoding, or can it be something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this comment:
Used COLLATE BINARY for ORDER BY for string types, in order to enforce a comparison similar to Trino.
Trino encodes all text as UTF-8, and sorts text as Unicode codepoints.
In Oracle BINARY collation provides unsigned byte-by-byte comparison.
If the Oracle DB uses UTF8 or AL32UTF8 charset, then BINARY collation will match Trino's sorting.
If the Oracle DB uses a non UTF8 charset, like LATIN1, then characters outside ASCII range
will not sort the same way as in Trino.
We have a limitation here, which seems similar to what exists for PostgreSQL connector.
PostgreSQL COLLATE "C" and Oracle COLLATE BINARY will do a byte-for-byte comparison. If the DB encoding is UTF, then this will match Trino's UTF-8 codepoint comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. that is a good comment.
Is this why PostgreSQL connector uses COLLATE C?
If the Oracle DB uses a non UTF8 charset, like LATIN1, then characters outside ASCII range
will not sort the same way as in Trino.
this sounds like a potential correctness issue. thanks for calling it out! cc @ebyhr @martint
I am not an Oracle DBA. Is it a known figure how often Oracle DBs uses non-UTF8 charset? Is it a theoretical problem or something to address? Can this be addressed automatically?
NVARCHAR2(size) Variable-length Unicode character string having maximum length size characters. You must specify size for NVARCHAR2. The number of bytes can be up to two times size for AL16UTF16 encoding and three times size for UTF8 encoding.
it sounds like Oracle assumes each codepoint is at most 3 bytes in UTF-8 encoding, whereas actual UTF-8 can produce 4 bytes for a single codepoint.
What does Oracle do when string contains codepoints > U+FFFF (outside Basic Multilingual Plane, like emojis)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add a boolean property to enable/disable the topN pushdown. The pushdown could be enabled by default, and the user could disable it in case the character encoding in not UTF8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to inspect oracle server side setting to know whether pushdown is safe to do?
plugin/trino-oracle/src/main/java/io/trino/plugin/oracle/OracleClient.java
Outdated
Show resolved
Hide resolved
plugin/trino-oracle/src/main/java/io/trino/plugin/oracle/OracleClient.java
Outdated
Show resolved
Hide resolved
258eca4
to
e713e73
Compare
plugin/trino-oracle/src/main/java/io/trino/plugin/oracle/OracleClient.java
Outdated
Show resolved
Hide resolved
plugin/trino-oracle/src/main/java/io/trino/plugin/oracle/OracleClient.java
Outdated
Show resolved
Hide resolved
plugin/trino-oracle/src/main/java/io/trino/plugin/oracle/OracleClient.java
Outdated
Show resolved
Hide resolved
plugin/trino-oracle/src/main/java/io/trino/plugin/oracle/OracleClient.java
Outdated
Show resolved
Hide resolved
4efe27d
to
b822377
Compare
The newest FETCH FIRST number ROWS ONLY syntax is used (supported in Oracle version >=12). Trino no longer supports older versions of Oracle (i.e. v11), which support only WHERE ROWNUM <= number syntax. We applied the FETCH FIRST syntax to the LIMIT expression, which was converted before to "SELECT * FROM (%s ORDER BY %s) WHERE ROWNUM <= %s". Oracle sets NULL values last by default. Sometimes, however, the default does not work, so we used explicit flag for nulls. Used COLLATE BINARY for ORDER BY for string types, in order to enforce a comparison similar to Trino. Trino encodes all text as UTF-8, and sorts text as Unicode codepoints. In Oracle BINARY collation provides unsigned byte-by-byte comparison. If the Oracle DB uses UTF8 or AL32UTF8 charset, then BINARY collation will match Trino's sorting. If the Oracle DB uses a non UTF8 charset, like WE8ISO8859P1 or WE8MSWIN1252, then Oracle will not sort the same way as in Trino.
0702643
to
893305a
Compare
// If the Oracle DB uses a non UTF8 charset, like WE8ISO8859P1 or WE8MSWIN1252, | ||
// then Oracle will not sort the same way as in Trino. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kokosing please help me understand whether this is a very unlikely edge case that we can leave to be addressed later, or a pretty common thing, requiring this new feature to be opt-in
also, do we need a kill-switch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collations are common.
// If the Oracle DB uses a non UTF8 charset, like WE8ISO8859P1 or WE8MSWIN1252, | ||
// then Oracle will not sort the same way as in Trino. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collations are common.
|
||
private boolean isCollatableInOracle(JdbcColumnHandle column) | ||
{ | ||
String jdbcTypeName = column.getJdbcTypeHandle().jdbcTypeName() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also have io.trino.plugin.jdbc.JdbcTypeHandle#caseSensitivity
. Can we use it to determine types with the same collation as in Trino?
Add support for TopN pushdown for Oracle
Fixes #26566
Description
The newest FETCH FIRST number ROWS ONLY syntax is used (supported in Oracle version >=12).
Trino no longer supports older versions of Oracle (i.e. v11), which support only WHERE ROWNUM <= number syntax.
We applied the FETCH FIRST syntax to the LIMIT expression, which was converted before to "SELECT * FROM (%s ORDER BY %s) WHERE ROWNUM <= %s".Oracle sets NULL values last by default. Sometimes, however, the default does not work, so we used explicit flag for nulls.
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
(X) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: