Releases: aliyun/aliyun-odps-java-sdk
v0.50.0-rc1
Changelog
[0.50.0-rc1] - 2024-09-19
Features
- SQLExecutor added
isUseInstanceTunnelmethod:- Used to determine whether to use instanceTunnel to obtain results
Fix
- Fixed an issue where when using SQLExecutor to execute MCQA 2.0 jobs, executing the CommandApi task would affect the next job, causing NPE to be thrown when retrieving results.
更新日志
[0.50.0-rc1] - 2024-09-19
功能
- SQLExecutor 新增
isUseInstanceTunnel方法:- 用来判断是否使用 instanceTunnel 取结果
修复
- 修复了使用 SQLExecutor 执行 MCQA 2.0 作业时,执行 CommandApi 任务会影响下一次作业,导致取结果时抛出NPE的问题。
v0.50.0-rc0
Changelog
[0.50.0-rc0] - 2024-09-18
Features
- SQLExecutor supports submitting MCQA 2.0 jobs
- SQLExecutorBuilder adds method
enableMcqaV2 - SQLExecutorBuilder adds getter methods for fields
- SQLExecutorBuilder adds method
- SQLExecutor adds
getQueryIdmethod:- For offline jobs and MCQA 2.0 jobs, it returns the currently executing job's InstanceId
- For MCQA 1.0 jobs, it returns the InstanceId and SubQueryId
- TableAPI adds
SharingQuotaTokenparameter inEnvironmentSettingsto support sharing quota resources during job submission - Quotas introduces
getWlmQuotamethod:- Allows retrieval of detailed quota information based on projectName and quotaNickName, including whether it belongs to interactive quotas
- Quota class adds
isInteractiveQuotamethod to determine if a quota belongs to interactive quotas (suitable for MCQA 2.0) - Adds
getResultByInstanceTunnel(Instance instance, String taskName, Long limit, boolean limitEnabled)method:- Allows unlimited retrieval of results via instanceTunnel (lifting restrictions requires higher permissions)
- UpsertSession.Builder adds
setLifecyclemethod to configure the session lifecycle
Fixes
- Fixed the issue where using SQLExecutor to execute offline jobs with
limitEnabledspecified resulted in no effect - Modified the SQLExecutor so that
getQueryIdmethod returns the job's instanceID instead of null when executing offline jobs - Fixed the issue where using instanceTunnel to retrieve results on encountering non-select statements no longer throws exceptions, instead falling back to non-tunnel logic
- Fixed the problem of missing one data entry when using DownloadSession to download data and an error occurred while the read count equaled the number of records to be read minus one
- The
clonemethod of the Odps class now correctly clones other fields, includingtunnelEndpoint - The Instance's
getRawTaskResultsmethod now does not make multiple requests when processing synchronous jobs
更新日志
[0.50.0-rc0] - 2024-09-18
功能
- SQLExecutor 支持提交 MCQA 2.0 作业
- SQLExecutorBuilder 新增方法
enableMcqaV2 - SQLExecutorBuilder 新增对字段的 getter 方法
- SQLExecutorBuilder 新增方法
- SQLExecutor 新增
getQueryId方法:- 对于离线作业和 MCQA 2.0 作业,会返回当前执行的作业 InstanceId
- 对于 MCQA 1.0 作业,会返回 InstanceId 和 SubQueryId
- TableAPI
EnvironmentSettings新增SharingQuotaToken参数,以支持提交作业时携带Quota资源共享临时凭证 - Quotas 新增
getWlmQuota方法:- 能够根据 projectName 和 quotaNickName 获取到 quota 的详细信息,比如是否属于交互式 quota
- Quota 类新增
isInteractiveQuota方法,用来判断 quota 是否属于交互式 quota(适用于 MCQA 2.0) - 新增
getResultByInstanceTunnel(Instance instance, String taskName, Long limit, boolean limitEnabled)方法:- 用来无限制地通过 instanceTunnel 获取结果(解除限制需要更高的权限)
- UpsertSession.Builder 新增
setLifecycle方法,用来配置 Session 生命周期
修复
- 修复了使用 SQLExecutor 执行离线作业时,指定
limitEnabled取结果但不生效的问题 - 修改了 SQLExecutor 执行离线作业时,
getQueryId方法会返回作业的 instanceID 而非 null - 修复了 SQLExecutor 执行离线作业时,当遇到非 select 语句时,使用 instanceTunnel 取结果不再抛出异常,而是回退到非 tunnel 逻辑
- 修复了使用 DownloadSession 下载数据时,发生错误且读取数量刚好等于要读取记录的数量 - 1 时重建漏掉一条数据的问题
- Odps 类的
clone方法现在能正确克隆包括tunnelEndpoint等其他字段 - Instance 的
getRawTaskResults方法现在在处理同步作业时不会多次发起请求
v0.49.0-public
Changelog
[0.49.0-public] - 2024-09-12
Features
-
OdpsRecordConverter Enhancement: Now supports converting data to SQL-compatible formats. For
example, for theLocalDatetype, data can be converted to"DATE 'yyyy-mm-dd'"format.
Additionally, for theBinarytype, hex representation format is now supported. -
Enhanced Predicate Pushdown for Storage Constants: Improved the behavior of the
Constant
class and added theConstant.of(Object, TypeInfo)method. Now, when setting or identifying types
as time types, the conversion to SQL-compatible format can be done correctly (enabling correct
pushdown of time types). Other type conversion issues have been fixed;
anIllegalArgumentExceptionwill be thrown during session creation when conversion to
SQL-compatible mode is not possible. -
UpsertSession Implements Closable Interface: Notifies users to properly release local
resources of the UpsertSession. -
SQLExecutorBuilder New Method
offlineJobPriority: Allows setting the priority of offline
jobs when a job rolls back. -
New Method in Table Class
getLastMajorCompactTime: Used to retrieve the last time the table
underwent major compaction. -
New Method in Instance Class
create(Job job, boolean tryWait): When thetryWaitparameter
is true, the job will attempt to wait on the server for a period of time to obtain results more
quickly. -
Resource Class Enhancement: Now able to determine if the corresponding resource is a temporary
resource. -
CreateProjectParma class enhancement Added
defaultCtrlServiceparameter to specify the default control cluster of the project.
Fixes
-
UpsertStream NPE Fix: Fixed an issue where an NPE was thrown during flush when a local error
occurred, preventing a proper retry. -
Varchar/Char type fix: Fixed the problem that when the
Varchar/Chartype obtains its length
and encounters special characters such as Chinese symbols or emoticons, it will be incorrectly
calculated twice.
更新日志
[0.49.0-public] - 2024-09-12
功能
-
OdpsRecordConverter 功能增强:现在支持将数据转换为 SQL 兼容格式,比如对于
LocalDate
类型,数据可以转换为"DATE 'yyyy-mm-dd'"格式。同时对于Binary类型,现在支持了 hex 表示格式。 -
开放存储谓词下推常量增强:改进了
Constant类行为,新增了Constant.of(Object, TypeInfo)
方法。现在当设定或识别出类型为时间类型时,可以正确转变为 SQL
兼容格式(也就是可以正确下推时间类型了)。同时修复了一些其他类型的问题,当无法转换成 SQL
兼容模式时,会在创建Session的时候抛出IllegalArgumentException。 -
UpsertSession 实现 Closable 方法:提醒用户应当正确释放 UpsertSession 的本地资源。
-
SQLExecutorBuilder 新增方法
offlineJobPriority:用来设置当作业发生回退时,离线作业的优先级。 -
Table 类新增方法
getLastMajorCompactTime:用来获取表最后一次 major compact 的时间。 -
Instance 类新增方法
create(Job job, boolean tryWait):当用户执行tryWait为true
时,作业会尝试在服务端等待一段时间,以更快获取结果。 -
Resource 类增强:现在能够判断对应的资源是否属于临时资源。
-
CreateProjectParma 类增强 新增
defaultCtrlService参数,用来指定项目的默认控制集群。
修复
-
UpsertStream NPE 修复:修复了在 flush 时,当发生本地错误时抛出 NPE 而无法正确重试的问题。
-
Varchar/Char 类型修复:修复了
Varchar/Char类型获取其长度时,当遇到中文符号或表情等特殊字符,会错误的计算两次的问题。
v0.48.8-public
Changelog
[0.48.8-public] - 2024-08-12
Enhancement
- Introduced internal validation of compound predicate expressions, fixed logic when handling
invalid or always true/false predicates, enhanced test coverage, and ensured stability and
accuracy in complex query optimization.
更新日志
[0.48.8-public] - 2024-08-12
增强
- 引入了对复合谓词表达式的内部验证,修复了处理无效或总是真/假谓词时的逻辑,增强了测试覆盖,确保了在复杂查询优化中的稳定性和准确性。
v0.48.7-public
Changelog
[0.48.7-public] - 2024-08-07
Enhancements
- TableTunnel Configuration Optimization: Introduced the
tagsattribute toTableTunnel Configuration, enabling users to attach custom tags to tunnel operations for enhanced logging and management. These tags are recorded in the tenant-levelinformation schema.
Odps odps;
Configuration configuration =
Configuration.builder(odps)
.withTags(Arrays.asList("tag1", "tag2")) // Utilize Arrays.asList for code standardization
.build();
TableTunnel tableTunnel = odps.tableTunnel(configuration);
// Proceed with tunnel operations- Instance Enhancement: Added the
waitForTerminatedAndGetResultmethod to theInstanceclass, integrating optimization strategies from versions 0.48.6 and 0.48.7 for theSQLExecutorinterface, enhancing operational efficiency. Refer tocom.aliyun.odps.sqa.SQLExecutorImpl.getOfflineResultSetfor usage.
Improve
- SQLExecutor Offline Job Processing Optimization: Significantly reduced end-to-end latency by enabling immediate result retrieval after critical processing stages of offline jobs executed by
SQLExecutor, without waiting for the job to fully complete, thus boosting response speed and resource utilization.
Fixes
- TunnelRetryHandler NPE Fix: Rectified a potential null pointer exception issue in the
getRetryPolicymethod when the error code (error code) wasnull.
更新日志
[0.48.7-public] - 2024-08-07
增强
- TableTunnel 配置优化:引入
tags属性至TableTunnel Configuration
,旨在允许用户为Tunnel相关操作附上自定义标签。这些标签会被记录在租户层级的information schema
中,便于日志追踪与管理。
Odps odps;
Configuration configuration=
Configuration.builder(odps)
.withTags(Arrays.asList("tag1","tag2"))
.build();
TableTunnel tableTunnel=odps.tableTunnel(configuration);
// 继续执行Tunnel相关操作- Instance 增强:在
Instance类中新增waitForTerminatedAndGetResult方法,此方法整合了 0.48.6
及 0.48.7 版本中对SQLExecutor
接口的优化策略,提升了操作效率。使用方式可参考com.aliyun.odps.sqa.SQLExecutorImpl.getOfflineResultSet
方法。
优化
- SQLExecutor 离线作业处理优化:显著减少了端到端延迟,通过改进使得由
SQLExecutor
执行的离线作业能在关键处理阶段完成后即刻获取结果,无需等待作业全部完成,提高了响应速度和资源利用率。
修复
- TunnelRetryHandler NPE修复:修正了
getRetryPolicy方法中在错误码 (error code) 为null
的情况下潜在空指针异常问题。
0.48.6-public
Changelog
[0.48.6-public] - 2024-07-17
Added
- Serializable Support:
- Key data types like
ArrayRecord,Column,TableSchema, andTypeInfonow support serialization and deserialization, enabling caching and inter-process communication.
- Key data types like
- Predicate Pushdown:
- Introduced
Attributetype predicates to specify column names.
- Introduced
Changed
- Tunnel Interface Refactoring:
- Refactored Tunnel-related interfaces to include seamless retry logic, greatly enhancing stability and robustness.
- Removed
TunnelRetryStrategyandConfigurationImplclasses, which are now replaced byTunnelRetryHandlerandConfigurationrespectively.
Improve
- SQLExecutor Optimization:
- Improved performance when executing offline SQL jobs through the
SQLExecutorinterface, reducing one network request per job to fetch results, thereby decreasing end-to-end latency.
- Improved performance when executing offline SQL jobs through the
Fixed
- Decimal Read in Table.read:
- Fixed issue where trailing zeroes in the
decimaltype were not as expected in theTable.readinterface.
- Fixed issue where trailing zeroes in the
更新日志
[0.48.6-public] - 2024-07-17
新增
- 支持序列化:
- 主要数据类型如
ArrayRecord、Column、TableSchema和TypeInfo现在支持序列化和反序列化,能够进行缓存和进程间通信。
- 主要数据类型如
- 谓词下推:
- 新增
Attribute类型的谓词,用于指定列名。
- 新增
变更
- Tunnel 接口重构:
- 重构了 Tunnel 相关接口,加入了无感知的重试逻辑,大大增强了稳定性和鲁棒性。
- 删除了
TunnelRetryStrategy和ConfigurationImpl类,分别被TunnelRetryHandler和Configuration所取代。
优化
- SQLExecutor 优化:
- 在使用
SQLExecutor接口执行离线 SQL 作业时进行优化,减少每个作业获取结果时的一次网络请求,从而减少端到端延时。
- 在使用
修复
- Table.read Decimal 读取:
- 修复了
Table.read接口在读取decimal类型时,后面补零不符合预期的问题。
- 修复了
v0.48.5-public
Changelog
[0.48.5-public] - 2024-06-18
Added
- Added the
getPartitionSpecsmethod to theTableinterface. Compared to thegetPartitionsmethod, this method does not require fetching detailed partition information, resulting in faster execution.
Changes
-
Removed the
isPrimaryKeymethod from theColumnclass. This method was initially added to support users in specifying certain columns as primary keys when creating a table. However, it was found to be misleading in read scenarios, as it does not communicate with the server. Therefore, it is not suitable for determining whether a column is a primary key. Moreover, when using this method for table creation, primary keys should be table-level fields (since primary keys are ordered), and this method neglected the order of primary keys, leading to a flawed design. Hence, it will be removed in version 0.48.6.For read scenarios, users should use the
Table.getPrimaryKey()method to retrieve primary keys. For table creation, users can now use thewithPrimaryKeysmethod in theTableCreatorto specify primary keys during table creation.
Fixes
- Fixed an issue in the
RecordConverterwhere formatting aRecordof typeStringwould throw an exception when the data type wasbyte[].
更新日志
[0.48.5-public] - 2024-06-18
新增
Table接口新增getPartitionSpecs方法, 相比getPartitions方法,该方法无需获取分区的详细信息,进而获得更快的执行速度
变更
- 移除了
Column类中的isPrimaryKey
方法。这个方法最初是为了支持用户在创建表时指定某些列为主键。然而,在读取场景下,这个方法容易引起误解,因为它并不会与服务端通信,所以当用户希望知道某列是否为主键时,这个方法并不适用。此外,在使用该方法建表时,主键应当是表级别的字段(因为主键是有序的),而该方法忽略了主键的顺序,设计上不合理。因此,将在0.48.6版本中移除了该方法。
在读取场景,用户应当使用Table.getPrimaryKey()方法来获取主键。
在建表场景,改为在TableCreator中增加withPrimaryKeys方法以达成建表时指定主键的目的。
修复
修复了RecordConverter在format String类型的Record,当数据类型为byte[] 时,会抛出异常的问题
v0.48.4-public
Changelog
[0.48.4-public] - 2024-06-04
New
- Use
table-apito write MaxCompute tables, now supportsJSONandTIMESTAMP_NTZtypes odps-sdk-udffunctions continue to be improved
Change
- When the Table.read() interface encounters the Decimal type, it will currently remove the trailing 0 by default (but will not use scientific notation)
Fix
- Fixed the problem that ArrayRecord does not support the getBytes method for JSON type
更新日志
[0.48.4-public] - 2024-06-04
新增
- 使用
table-api写MaxCompute表,现在支持JSON和TIMESTAMP_NTZ类型 odps-sdk-udf功能继续完善
变更
- Table.read() 接口在遇到 Decimal 类型时,目前将默认去掉尾部的 0(但不会使用科学计数法)
修复
- 修复了 ArrayRecord 针对 JSON 类型不支持 getBytes 方法的问题
v0.48.3-public
Changelog
[0.48.3-public] - 2024-05-21
Added
- Support for passing
retryStrategywhen buildingUpsertSession.
Changed
- The
onFlushFail(String, int)interface inUpsertStream.Listenerhas been marked as@Deprecatedin favor ofonFlushFail(Throwable, int)interface. This interface will be removed in version 0.50.0. - Default compression algorithm for Tunnel upsert has been changed to
ODPS_LZ4_FRAME.
Fixed
- Fixed an issue where data couldn't be written correctly in Tunnel upsert when the compression algorithm was set to something other than
ZLIB. - Fixed a resource leak in
UpsertSessionthat could persist for a long time ifclosewas not explicitly called by the user. - Fixed an exception thrown by Tunnel data retrieval interfaces (
preview,download) when encountering invalidDecimaltypes (such asinf,nan) in tables; will now returnnullto align with thegetResultinterface.
更新日志
[0.48.3-public] - 2024-05-21
新增
- 在构建UpsertSession时,现在支持传入
retryStrategy。
变更
UpsertStream.Listener的onFlushFail(String, int)接口被标记为了@Deprecated,使用onFlushFail(Throwable, int)接口替代。该接口将在 0.50.0 版本中移除。- Tunnel upsert 的默认压缩算法更改为
ODPS_LZ4_FRAME。
修复
- 修复了 Tunnel upsert 当压缩算法不为
ZLIB时,数据无法正确写入的问题。 - 修复了 UpsertSession 当用户未显式调用
close时,资源长时间无法释放的问题。 - 修复了 Tunnel 获取数据相关接口(
preview,download),当遇到表内存在不合法Decimal类型时(如inf,nan),会抛出异常的问题,现在会返回null(与getResult接口一致)。
v0.48.2-public
Changelog
[0.48.2-public] - 2024-05-08
Important fixes
Fixed the issue of relying on the user's local time zone when bucketing primary keys of DATE and DATETIME types during Tunnel upsert. This may lead to incorrect bucketing and abnormal data query. Users who rely on this feature are strongly recommended to upgrade to version 0.48.2.
Added
Table adds a method getTableLifecycleConfig() to obtain the lifecycle configuration of hierarchical storage.
TableReadSession now supports predicate pushdown
更新日志
[0.48.2-public] - 2024-05-08
重要修复
修复了Tunnel upsert时,对DATE、DATETIME类型的主键进行分桶时,依赖用户本地时区的问题。这可能导致分桶有误,导致数据查询异常。强烈建议依赖该特性的用户升级到0.48.2版本。
新增
Table增加获取分层存储的lifecycle配置的方法getTableLifecycleConfig()。
TableReadSession 现支持谓词下推了