Skip to content

Releases: aliyun/aliyun-odps-java-sdk

v0.50.0-rc1

19 Sep 06:30

Choose a tag to compare

v0.50.0-rc1 Pre-release
Pre-release

Changelog

[0.50.0-rc1] - 2024-09-19

Features

  • SQLExecutor added isUseInstanceTunnel method:
    • Used to determine whether to use instanceTunnel to obtain results

Fix

  • Fixed an issue where when using SQLExecutor to execute MCQA 2.0 jobs, executing the CommandApi task would affect the next job, causing NPE to be thrown when retrieving results.

更新日志

[0.50.0-rc1] - 2024-09-19

功能

  • SQLExecutor 新增 isUseInstanceTunnel 方法:
    • 用来判断是否使用 instanceTunnel 取结果

修复

  • 修复了使用 SQLExecutor 执行 MCQA 2.0 作业时,执行 CommandApi 任务会影响下一次作业,导致取结果时抛出NPE的问题。

v0.50.0-rc0

18 Sep 11:27

Choose a tag to compare

v0.50.0-rc0 Pre-release
Pre-release

Changelog

[0.50.0-rc0] - 2024-09-18

Features

  • SQLExecutor supports submitting MCQA 2.0 jobs
    • SQLExecutorBuilder adds method enableMcqaV2
    • SQLExecutorBuilder adds getter methods for fields
  • SQLExecutor adds getQueryId method:
    • For offline jobs and MCQA 2.0 jobs, it returns the currently executing job's InstanceId
    • For MCQA 1.0 jobs, it returns the InstanceId and SubQueryId
  • TableAPI adds SharingQuotaToken parameter in EnvironmentSettings to support sharing quota resources during job submission
  • Quotas introduces getWlmQuota method:
    • Allows retrieval of detailed quota information based on projectName and quotaNickName, including whether it belongs to interactive quotas
  • Quota class adds isInteractiveQuota method to determine if a quota belongs to interactive quotas (suitable for MCQA 2.0)
  • Adds getResultByInstanceTunnel(Instance instance, String taskName, Long limit, boolean limitEnabled) method:
    • Allows unlimited retrieval of results via instanceTunnel (lifting restrictions requires higher permissions)
  • UpsertSession.Builder adds setLifecycle method to configure the session lifecycle

Fixes

  • Fixed the issue where using SQLExecutor to execute offline jobs with limitEnabled specified resulted in no effect
  • Modified the SQLExecutor so that getQueryId method returns the job's instanceID instead of null when executing offline jobs
  • Fixed the issue where using instanceTunnel to retrieve results on encountering non-select statements no longer throws exceptions, instead falling back to non-tunnel logic
  • Fixed the problem of missing one data entry when using DownloadSession to download data and an error occurred while the read count equaled the number of records to be read minus one
  • The clone method of the Odps class now correctly clones other fields, including tunnelEndpoint
  • The Instance's getRawTaskResults method now does not make multiple requests when processing synchronous jobs

更新日志

[0.50.0-rc0] - 2024-09-18

功能

  • SQLExecutor 支持提交 MCQA 2.0 作业
    • SQLExecutorBuilder 新增方法 enableMcqaV2
    • SQLExecutorBuilder 新增对字段的 getter 方法
  • SQLExecutor 新增 getQueryId 方法:
    • 对于离线作业和 MCQA 2.0 作业,会返回当前执行的作业 InstanceId
    • 对于 MCQA 1.0 作业,会返回 InstanceId 和 SubQueryId
  • TableAPI EnvironmentSettings 新增 SharingQuotaToken 参数,以支持提交作业时携带Quota资源共享临时凭证
  • Quotas 新增 getWlmQuota 方法:
    • 能够根据 projectName 和 quotaNickName 获取到 quota 的详细信息,比如是否属于交互式 quota
  • Quota 类新增 isInteractiveQuota 方法,用来判断 quota 是否属于交互式 quota(适用于 MCQA 2.0)
  • 新增 getResultByInstanceTunnel(Instance instance, String taskName, Long limit, boolean limitEnabled) 方法:
    • 用来无限制地通过 instanceTunnel 获取结果(解除限制需要更高的权限)
  • UpsertSession.Builder 新增 setLifecycle 方法,用来配置 Session 生命周期

修复

  • 修复了使用 SQLExecutor 执行离线作业时,指定 limitEnabled 取结果但不生效的问题
  • 修改了 SQLExecutor 执行离线作业时,getQueryId 方法会返回作业的 instanceID 而非 null
  • 修复了 SQLExecutor 执行离线作业时,当遇到非 select 语句时,使用 instanceTunnel 取结果不再抛出异常,而是回退到非 tunnel 逻辑
  • 修复了使用 DownloadSession 下载数据时,发生错误且读取数量刚好等于要读取记录的数量 - 1 时重建漏掉一条数据的问题
  • Odps 类clone 方法现在能正确克隆包括 tunnelEndpoint 等其他字段
  • InstancegetRawTaskResults 方法现在在处理同步作业时不会多次发起请求

v0.49.0-public

12 Sep 07:38

Choose a tag to compare

Changelog

[0.49.0-public] - 2024-09-12

Features

  • OdpsRecordConverter Enhancement: Now supports converting data to SQL-compatible formats. For
    example, for the LocalDate type, data can be converted to "DATE 'yyyy-mm-dd'" format.
    Additionally, for the Binary type, hex representation format is now supported.

  • Enhanced Predicate Pushdown for Storage Constants: Improved the behavior of the Constant
    class and added the Constant.of(Object, TypeInfo) method. Now, when setting or identifying types
    as time types, the conversion to SQL-compatible format can be done correctly (enabling correct
    pushdown of time types). Other type conversion issues have been fixed;
    an IllegalArgumentException will be thrown during session creation when conversion to
    SQL-compatible mode is not possible.

  • UpsertSession Implements Closable Interface: Notifies users to properly release local
    resources of the UpsertSession.

  • SQLExecutorBuilder New Method offlineJobPriority: Allows setting the priority of offline
    jobs when a job rolls back.

  • New Method in Table Class getLastMajorCompactTime: Used to retrieve the last time the table
    underwent major compaction.

  • New Method in Instance Class create(Job job, boolean tryWait): When the tryWait parameter
    is true, the job will attempt to wait on the server for a period of time to obtain results more
    quickly.

  • Resource Class Enhancement: Now able to determine if the corresponding resource is a temporary
    resource.

  • CreateProjectParma class enhancement Added defaultCtrlService parameter to specify the default control cluster of the project.

Fixes

  • UpsertStream NPE Fix: Fixed an issue where an NPE was thrown during flush when a local error
    occurred, preventing a proper retry.

  • Varchar/Char type fix: Fixed the problem that when the Varchar/Char type obtains its length
    and encounters special characters such as Chinese symbols or emoticons, it will be incorrectly
    calculated twice.


更新日志

[0.49.0-public] - 2024-09-12

功能

  • OdpsRecordConverter 功能增强:现在支持将数据转换为 SQL 兼容格式,比如对于 LocalDate
    类型,数据可以转换为 "DATE 'yyyy-mm-dd'" 格式。同时对于 Binary 类型,现在支持了 hex 表示格式。

  • 开放存储谓词下推常量增强:改进了 Constant 类行为,新增了 Constant.of(Object, TypeInfo)
    方法。现在当设定或识别出类型为时间类型时,可以正确转变为 SQL
    兼容格式(也就是可以正确下推时间类型了)。同时修复了一些其他类型的问题,当无法转换成 SQL
    兼容模式时,会在创建 Session 的时候抛出 IllegalArgumentException

  • UpsertSession 实现 Closable 方法:提醒用户应当正确释放 UpsertSession 的本地资源。

  • SQLExecutorBuilder 新增方法 offlineJobPriority:用来设置当作业发生回退时,离线作业的优先级。

  • Table 类新增方法 getLastMajorCompactTime:用来获取表最后一次 major compact 的时间。

  • Instance 类新增方法 create(Job job, boolean tryWait):当用户执行 tryWaittrue
    时,作业会尝试在服务端等待一段时间,以更快获取结果。

  • Resource 类增强:现在能够判断对应的资源是否属于临时资源。

  • CreateProjectParma 类增强 新增defaultCtrlService参数,用来指定项目的默认控制集群。

修复

  • UpsertStream NPE 修复:修复了在 flush 时,当发生本地错误时抛出 NPE 而无法正确重试的问题。

  • Varchar/Char 类型修复:修复了 Varchar/Char 类型获取其长度时,当遇到中文符号或表情等特殊字符,会错误的计算两次的问题。

v0.48.8-public

12 Aug 11:32

Choose a tag to compare

Changelog

[0.48.8-public] - 2024-08-12

Enhancement

  • Introduced internal validation of compound predicate expressions, fixed logic when handling
    invalid or always true/false predicates, enhanced test coverage, and ensured stability and
    accuracy in complex query optimization.

更新日志

[0.48.8-public] - 2024-08-12

增强

  • 引入了对复合谓词表达式的内部验证,修复了处理无效或总是真/假谓词时的逻辑,增强了测试覆盖,确保了在复杂查询优化中的稳定性和准确性。

v0.48.7-public

07 Aug 02:54

Choose a tag to compare

Changelog

[0.48.7-public] - 2024-08-07

Enhancements

  • TableTunnel Configuration Optimization: Introduced the tags attribute to TableTunnel Configuration, enabling users to attach custom tags to tunnel operations for enhanced logging and management. These tags are recorded in the tenant-level information schema.
Odps odps;
Configuration configuration =
    Configuration.builder(odps)
                 .withTags(Arrays.asList("tag1", "tag2")) // Utilize Arrays.asList for code standardization
                 .build();
TableTunnel tableTunnel = odps.tableTunnel(configuration);
// Proceed with tunnel operations
  • Instance Enhancement: Added the waitForTerminatedAndGetResult method to the Instance class, integrating optimization strategies from versions 0.48.6 and 0.48.7 for the SQLExecutor interface, enhancing operational efficiency. Refer to com.aliyun.odps.sqa.SQLExecutorImpl.getOfflineResultSet for usage.

Improve

  • SQLExecutor Offline Job Processing Optimization: Significantly reduced end-to-end latency by enabling immediate result retrieval after critical processing stages of offline jobs executed by SQLExecutor, without waiting for the job to fully complete, thus boosting response speed and resource utilization.

Fixes

  • TunnelRetryHandler NPE Fix: Rectified a potential null pointer exception issue in the getRetryPolicy method when the error code (error code) was null.

更新日志

[0.48.7-public] - 2024-08-07

增强

  • TableTunnel 配置优化:引入 tags 属性至 TableTunnel Configuration
    ,旨在允许用户为Tunnel相关操作附上自定义标签。这些标签会被记录在租户层级的 information schema
    中,便于日志追踪与管理。
Odps odps;
    Configuration configuration=
    Configuration.builder(odps)
    .withTags(Arrays.asList("tag1","tag2")) 
    .build();
    TableTunnel tableTunnel=odps.tableTunnel(configuration);
// 继续执行Tunnel相关操作
  • Instance 增强:在 Instance 类中新增 waitForTerminatedAndGetResult 方法,此方法整合了 0.48.6
    及 0.48.7 版本中对 SQLExecutor
    接口的优化策略,提升了操作效率。使用方式可参考 com.aliyun.odps.sqa.SQLExecutorImpl.getOfflineResultSet
    方法。

优化

  • SQLExecutor 离线作业处理优化:显著减少了端到端延迟,通过改进使得由 SQLExecutor
    执行的离线作业能在关键处理阶段完成后即刻获取结果,无需等待作业全部完成,提高了响应速度和资源利用率。

修复

  • TunnelRetryHandler NPE修复:修正了 getRetryPolicy 方法中在错误码 (error code) 为 null
    的情况下潜在空指针异常问题。

0.48.6-public

17 Jul 12:20

Choose a tag to compare

Changelog

[0.48.6-public] - 2024-07-17

Added

  • Serializable Support:
    • Key data types like ArrayRecord, Column, TableSchema, and TypeInfo now support serialization and deserialization, enabling caching and inter-process communication.
  • Predicate Pushdown:
    • Introduced Attribute type predicates to specify column names.

Changed

  • Tunnel Interface Refactoring:
    • Refactored Tunnel-related interfaces to include seamless retry logic, greatly enhancing stability and robustness.
    • Removed TunnelRetryStrategy and ConfigurationImpl classes, which are now replaced by TunnelRetryHandler and Configuration respectively.

Improve

  • SQLExecutor Optimization:
    • Improved performance when executing offline SQL jobs through the SQLExecutor interface, reducing one network request per job to fetch results, thereby decreasing end-to-end latency.

Fixed

  • Decimal Read in Table.read:
    • Fixed issue where trailing zeroes in the decimal type were not as expected in the Table.read interface.

更新日志

[0.48.6-public] - 2024-07-17

新增

  • 支持序列化
    • 主要数据类型如 ArrayRecordColumnTableSchemaTypeInfo 现在支持序列化和反序列化,能够进行缓存和进程间通信。
  • 谓词下推
    • 新增 Attribute 类型的谓词,用于指定列名。

变更

  • Tunnel 接口重构
    • 重构了 Tunnel 相关接口,加入了无感知的重试逻辑,大大增强了稳定性和鲁棒性。
    • 删除了 TunnelRetryStrategyConfigurationImpl 类,分别被 TunnelRetryHandlerConfiguration 所取代。

优化

  • SQLExecutor 优化
    • 在使用 SQLExecutor 接口执行离线 SQL 作业时进行优化,减少每个作业获取结果时的一次网络请求,从而减少端到端延时。

修复

  • Table.read Decimal 读取
    • 修复了 Table.read 接口在读取 decimal 类型时,后面补零不符合预期的问题。

v0.48.5-public

18 Jun 03:41

Choose a tag to compare

Changelog

[0.48.5-public] - 2024-06-18

Added

  • Added the getPartitionSpecs method to the Table interface. Compared to the getPartitions method, this method does not require fetching detailed partition information, resulting in faster execution.

Changes

  • Removed the isPrimaryKey method from the Column class. This method was initially added to support users in specifying certain columns as primary keys when creating a table. However, it was found to be misleading in read scenarios, as it does not communicate with the server. Therefore, it is not suitable for determining whether a column is a primary key. Moreover, when using this method for table creation, primary keys should be table-level fields (since primary keys are ordered), and this method neglected the order of primary keys, leading to a flawed design. Hence, it will be removed in version 0.48.6.

    For read scenarios, users should use the Table.getPrimaryKey() method to retrieve primary keys. For table creation, users can now use the withPrimaryKeys method in the TableCreator to specify primary keys during table creation.

Fixes

  • Fixed an issue in the RecordConverter where formatting a Record of type String would throw an exception when the data type was byte[].

更新日志

[0.48.5-public] - 2024-06-18

新增

  • Table 接口新增 getPartitionSpecs 方法, 相比 getPartitions 方法,该方法无需获取分区的详细信息,进而获得更快的执行速度

变更

  • 移除了Column类中的isPrimaryKey
    方法。这个方法最初是为了支持用户在创建表时指定某些列为主键。然而,在读取场景下,这个方法容易引起误解,因为它并不会与服务端通信,所以当用户希望知道某列是否为主键时,这个方法并不适用。此外,在使用该方法建表时,主键应当是表级别的字段(因为主键是有序的),而该方法忽略了主键的顺序,设计上不合理。因此,将在0.48.6版本中移除了该方法。
    在读取场景,用户应当使用Table.getPrimaryKey()方法来获取主键。
    在建表场景,改为在TableCreator中增加withPrimaryKeys方法以达成建表时指定主键的目的。

修复

修复了RecordConverter在format String类型的Record,当数据类型为byte[] 时,会抛出异常的问题

v0.48.4-public

03 Jun 12:27

Choose a tag to compare

Changelog

[0.48.4-public] - 2024-06-04

New

  • Use table-api to write MaxCompute tables, now supports JSON and TIMESTAMP_NTZ types
  • odps-sdk-udf functions continue to be improved

Change

  • When the Table.read() interface encounters the Decimal type, it will currently remove the trailing 0 by default (but will not use scientific notation)

Fix

  • Fixed the problem that ArrayRecord does not support the getBytes method for JSON type

更新日志

[0.48.4-public] - 2024-06-04

新增

  • 使用 table-api 写MaxCompute表,现在支持JSONTIMESTAMP_NTZ类型
  • odps-sdk-udf 功能继续完善

变更

  • Table.read() 接口在遇到 Decimal 类型时,目前将默认去掉尾部的 0(但不会使用科学计数法)

修复

  • 修复了 ArrayRecord 针对 JSON 类型不支持 getBytes 方法的问题

v0.48.3-public

21 May 08:03

Choose a tag to compare

Changelog

[0.48.3-public] - 2024-05-21

Added

  • Support for passing retryStrategy when building UpsertSession.

Changed

  • The onFlushFail(String, int) interface in UpsertStream.Listener has been marked as @Deprecated in favor of onFlushFail(Throwable, int) interface. This interface will be removed in version 0.50.0.
  • Default compression algorithm for Tunnel upsert has been changed to ODPS_LZ4_FRAME.

Fixed

  • Fixed an issue where data couldn't be written correctly in Tunnel upsert when the compression algorithm was set to something other than ZLIB.
  • Fixed a resource leak in UpsertSession that could persist for a long time if close was not explicitly called by the user.
  • Fixed an exception thrown by Tunnel data retrieval interfaces (preview, download) when encountering invalid Decimal types (such as inf, nan) in tables; will now return null to align with the getResult interface.

更新日志

[0.48.3-public] - 2024-05-21

新增

  • 在构建UpsertSession时,现在支持传入 retryStrategy

变更

  • UpsertStream.ListeneronFlushFail(String, int) 接口被标记为了 @Deprecated,使用 onFlushFail(Throwable, int) 接口替代。该接口将在 0.50.0 版本中移除。
  • Tunnel upsert 的默认压缩算法更改为 ODPS_LZ4_FRAME

修复

  • 修复了 Tunnel upsert 当压缩算法不为 ZLIB 时,数据无法正确写入的问题。
  • 修复了 UpsertSession 当用户未显式调用 close 时,资源长时间无法释放的问题。
  • 修复了 Tunnel 获取数据相关接口(previewdownload),当遇到表内存在不合法 Decimal 类型时(如 infnan),会抛出异常的问题,现在会返回 null(与 getResult 接口一致)。

v0.48.2-public

08 May 04:05

Choose a tag to compare

Changelog

[0.48.2-public] - 2024-05-08

Important fixes

Fixed the issue of relying on the user's local time zone when bucketing primary keys of DATE and DATETIME types during Tunnel upsert. This may lead to incorrect bucketing and abnormal data query. Users who rely on this feature are strongly recommended to upgrade to version 0.48.2.

Added

Table adds a method getTableLifecycleConfig() to obtain the lifecycle configuration of hierarchical storage.
TableReadSession now supports predicate pushdown

更新日志

[0.48.2-public] - 2024-05-08

重要修复

修复了Tunnel upsert时,对DATE、DATETIME类型的主键进行分桶时,依赖用户本地时区的问题。这可能导致分桶有误,导致数据查询异常。强烈建议依赖该特性的用户升级到0.48.2版本。

新增

Table增加获取分层存储的lifecycle配置的方法getTableLifecycleConfig()。
TableReadSession 现支持谓词下推了