Skip to content

Releases: aliyun/aliyun-odps-java-sdk

v0.51.10-SNAPSHOT

06 Mar 08:29

Choose a tag to compare

v0.51.10-SNAPSHOT Pre-release
Pre-release

Release Note: Data Consistency Enhancement for Struct Handling (v0.51.10-SNAPSHOT)


Background

The Tunnel SDK transmits data to the server based on column indices (not column names). This requires users to construct Record objects strictly following the column order defined in the TableSchema bound to the current session. However, this approach introduces complexity when building nested Struct types, as field order mismatches can lead to data inconsistency.


Solutions

1. ReorderableStruct for Flexible Struct Construction

Problem: Directly constructing Struct via field names may result in order mismatches with the target schema.

Solution:
Introducing ReorderableStruct (implements Struct interface) with the following features:

  • Constructor:
    public ReorderableStruct(StructTypeInfo type)
  • Field Setting Methods:
    public void setFieldValue(String fieldName, Object value) // Set by field name (case-insensitive)
    public void setFieldValue(int index, Object value)        // Set by index
  • Example:
    ReorderableStruct person = new ReorderableStruct(personStructType);
    person.setFieldValue("money", 1234L); // Field order irrelevant
    person.setFieldValue("age", 25);
    person.setFieldValue("name", "Jason");
  • Behavior:
    Ensures internal field order aligns with the schema, even if fields are set out-of-order.

2. ReorderableRecord for Automatic Schema Alignment

Problem: Structs constructed outside the SDK may have field orders incompatible with the target schema.

Solution:
Introducing ReorderableRecord (extends ArrayRecord), which automatically reorders Struct fields (including nested types like Array<Struct> or Map<Struct>) to match the schema:

  • Usage:
    TableTunnel.StreamUploadSession uploadSession;
    Struct upstreamData;
    Record record = new ReorderableRecord(uploadSession.getSchema()); // Bind to schema
    record.set("struct", upstreamData); // Automatic reordering
  • Performance Note:
    Reordering incurs additional computational overhead. Use this approach only when schema alignment is uncertain.

Key Constraints

The reorder method enforces strict schema consistency:

  • Field names, data types, and nested structures (including maps/arrays) must exactly match between the Struct and target schema (order-independent).
  • Exceptions:
    • IllegalArgumentException is thrown for mismatches (e.g., missing fields, type conflicts, or nested schema inconsistencies).

v0.51.9-public

26 Feb 03:16

Choose a tag to compare

[0.51.9-public] - 2025-02-26

Fixes

  • Struct Field Escaping
    Fixed getName(true) not adding backticks to all nested struct field names in TypeInfo.

[0.51.9-public] - 2025-02-26

问题修复

  • 结构体字段转义修复
    修复 TypeInfogetName(true) 方法未对嵌套结构体字段名添加反引号的问题

v0.51.8-public

20 Feb 09:30

Choose a tag to compare

[0.51.8-public] - 2025-02-20

Changes

  • Record The set(String columnName, Object value) method now ignores the case of columnName. The getColumn method will always return column names in lowercase.
    ⚠️ Compatibility Note: This change will affect the performance of ArrayRecord initialization and setByName operations. Users should conduct corresponding performance tests. We have introduced a toggle in version 0.52.3 to disable this feature.

Features

  • Table Added getMetadataJson and getExtendedInfoJson methods.
  • Partition Added getMetadataJson, getExtendedInfoJson, getCdcSize, and getCdcRecordNum methods.
  • CommandApi Enhanced the DescribeTableCommand to include additional MetadataJson and ExtendedInfoJson fields in the response.
  • PartitionSpec Improved error messages for build failures to provide clearer debugging information.

[0.51.8-public] - 2025-02-20

变更

  • Record set(String columnName, Object value) 方法现在会忽略 columnName 的大小写。getColumn 方法返回的列名将始终为小写。

    ⚠️ 兼容性提示: 注意,这项改动会影响 ArrayRecord 初始化和 setByName 时的性能,用户应当相应的性能测试,我们在 0.52.3 版本中增加了开关来关闭这项功能。

功能

  • Table 新增getMetadataJsongetExtendedInfoJson方法
  • Partition 新增getMetadataJson,getExtendedInfoJson,getCdcSize,getCdcRecordNum方法
  • CommandApi 增强 DescribeTableCommand,现在将额外返回 MetadataJsonExtendedInfoJson 字段
  • PartitionSpec 改进构建失败时的报错信息,使报错更加明晰

v0.51.7-public

13 Feb 09:59

Choose a tag to compare

[0.51.7-public] - 2025-02-13

Features

  • EPV2 Added support for EPV2 (External Project V2), including ListTable, ListSchema, DescribeTable interfaces
    ⚠️ Compatibility Note: This will slightly impact the performance of these interfaces (functionality remains unaffected) and requires user attention. We have added a configuration option in version 0.52.3 to turn off this feature.
  • MCQA Added fallback logging when retrieving results via InstanceTunnel encounters failure rollback scenarios

[0.51.7-public] - 2025-02-13

功能

  • MCQA 在通过 InstanceTunnel 取结果,发生失败回退的场景,加入回退日志

  • EPV2 新增对 EPV2(External Project V2)的支持,包括ListTable, ListSchema, DescribeTable 等接口。

    ⚠️ 兼容性提示:这会略微影响这些接口的性能(功能不受影响),需要用户注意,我们在 0.52.3 版本中增加了开关来关闭这项功能。

v0.51.6-public

26 Jan 06:42

Choose a tag to compare

[0.51.6-public] - 2025-01-26

Fixes

  • TypeInfo Fixed an issue where StructTypeInfo nested within ArrayTypeInfo or MapTypeInfo would not quote field names in nested structures when using getTypeName(true) method

[0.51.6-public] - 2025-01-26

修复

  • TypeInfo 修复了当 StructTypeInfo 嵌套在 ArrayTypeInfoMapTypeInfo 内时,getTypeName(true) 方法不会对嵌套内字段名进行 quote 的问题。

v0.51.5-public

15 Jan 03:05

Choose a tag to compare

[0.51.5-public] - 2025-01-14

Fixes

  • MCQA2 Fixed the problem that MCQA2 jobs may not throw exceptions correctly when using instance tunnel to obtain results.

[0.51.5-public] - 2025-01-14

修复

  • MCQA2 修复了 MCQA2 作业,可能会使用tunnel取结果时,无法正确抛出异常的问题

v0.51.4-public

14 Jan 03:21

Choose a tag to compare

Changelog

[0.51.4-public] - 2025-01-14

Features

  • MCQA2: Added several optimizations to improve the execution efficiency of MCQA2 jobs. MCQA2 jobs now use ExecuteMode.INTERACTIVE_V2 mode, distinguishing it from MCQA1's ExecuteMode.INTERACTIVE
  • SQLExecutor: Added new getExecuteMode method to retrieve job execution mode

Changes

  • UpsertStream: In version 0.51.0, the signature of the close method was modified (no longer throwing TunnelException). This version restores it to maintain API compatibility.
  • ClusterInfo: The toString method was changed in version 0.51.0. This version restores it to maintain API compatibility.
  • TunnelRetryStrategy and ConfigurationImpl classes: These were removed in version 0.48.6. This version restores them (though they won't have any effect!) to maintain API compatibility.

更新日志

[0.51.4-public] - 2025-01-14

功能

  • MCQA2 增加若干项优化,提升了 MCQA2 作业的执行效率。MCQA2 作业的模式变为 ExecuteMode.INTERACTIVE_V2,与 MCQA1 的 ExecuteMode.INTERACTIVE 区别开
  • SQLExecutor 新增 getExecuteMode 方法,用于获取作业执行模式

变更

  • UpsertStream 在 0.51.0 版本,修改了 close 方法的函数签名(不再抛出 TunnelException),在本版本中恢复,以保证接口兼容性。
  • ClusterInfo 在 0.51.0 版本,toString 方法有所变更,在本版本中恢复,以保证接口兼容性。
  • TunnelRetryStrategyConfigurationImpl 类在 0.48.6 版本被移除,在本版本中恢复(但不会起到任何效果!),以保证接口兼容性。

v0.51.3-public

07 Jan 07:20

Choose a tag to compare

Changelog

[0.51.3-public] - 2025-01-07

Features

  • MCQA2 SQLExecutorImpl adds a new setProject method to specify the default project used for submitting jobs.

Changes

  • StreamTunnel when calling the append method, if the number of Record columns exceeds the number of Session Schema columns, it will now throw a SchemaMismatchException (extends IOException) instead of throwing an IOException, and the error message has been optimized.

更新日志

[0.51.3-public] - 2025-01-07

功能

  • MCQA2 SQLExecutorImpl 新增 setProject 方法,用于指定提交作业使用的默认项目

变更

  • StreamTunnel 在调用 append 方法时,当 Record 列数量大于 Session Schema 列数量,现在将抛出 SchemaMismatchException(extend IOExcption),而不是抛出 IOException,并优化了错误信息

v0.51.2-public

20 Dec 07:02

Choose a tag to compare

[0.51.2-public] - 2024-12-20

Features

  • Authorization Introduced the credential-java authorization package, now supporting authentication with AlibabaCloudCredentialsProvider.
  • StreamUploadSession Added awareness for Slot updates and automatic retry logic.
  • table-api Introduced the TableRetryHandler class, adding retry logic to the table-api.
  • udf The InputSplitter now includes the method setLimit.

Changes

  • TypeInfo The StructTypeInfo class now includes the method getTypeName(boolean quote). In version 0.51.0-public (rc0), StructTypeInfo defaulted to quoting field names with backticks. We suspect that this change may affect users, so we decided to revert to the original behavior (not quoting by default). Users can now call getTypeName(true) when quoting is needed.

Fixes

  • TypeInfo Field names will now be correctly escaped when quoted with backticks.
  • MCQA2 Fixed an issue where the getRawTaskResults interface call in MCQA2 jobs could not retrieve results.

[0.51.2-public] - 2024-12-20

功能

  • Authorization 引入credential-java鉴权包,现在能够使用AlibabaCloudCredentialsProvider进行鉴权
  • StreamUploadSession 新增对 Slot 更新的感知和自动重试逻辑
  • table-api 引入TableRetryHandler类,为table-api添加重试逻辑
  • udfInputSplitter新增方法 setLimit

变更

  • TypeInfo StructTypeInfo 新增方法 getTypeName(boolean quote),在 0.51.0-public (rc0) 版本,StructTypeInfo 默认会对字段名使用反引号进行 quote,我们怀疑这项变更对用户有影响,因此决定恢复原行为(默认不进行quote)
    而是当用户需要 quote 时,可以调用 getTypeName(true)

修复

  • TypeInfo 当对字段使用反引号进行 quote 时,现在会正确对字段名进行转义
  • MCQA2 修复了 MCQA2 作业调用 getRawTaskResults 接口取不到结果的问题

v0.51.0-public

05 Dec 07:13

Choose a tag to compare

Changelog

[0.51.0-public] - 2024-12-05

Features

  • MapReduce Supports multi pipeline output.
  • VolumeBuilder Added the accelerate method to speed up the download process using dragonfly when the external volume is too large.
  • Table Introduced TableType OBJECT_TABLE and the method isObjectTable to verify it.
  • Project The list method now includes a filter condition enableDr to filter projects based on whether data disaster recovery is enabled.
  • Cluster New fields added: clusterRole, jobDataPath, and zoneId.

Changes

  • TableBatchReadSession The predicate class variable is now set to transient.
  • Attribute added escaping logic and will no longer double quote.
  • SQLTask Restored the SQLTask.run(Odps odps, String project, String sql, String taskName, Map<String, String> hints, Map<String, String> aliases, int priority) method removed in version 0.49.0 to resolve potential interface conflicts when users' MR jobs depend on older versions of the SDK.

Fixes

  • Table.changeOwner Fixed SQL spelling error.
  • Instance.getTaskSummary Removed unreasonable debug logging introduced since version 0.50.2.
  • TruncTime Uses backticks to quote columnName during table creation/toString.

Note: This version also includes all changes from "0.51.0-public.rc0" and "0.51.0-public.rc1".


更新日志

[0.51.0-public] - 2024-12-05

功能

  • MapReduce 支持多重管道输出 (multi pipeline output)。
  • VolumeBuilder 新增 accelerate 方法,用于在 external volume 过大时,使用 dragonfly 加速下载过程。
  • Table 新增 TableType OBJECT_TABLE 和判断方法 isObjectTable
  • Project list 方法增加过滤条件 enableDr,用于过滤项目是否开启存储容灾。
  • Cluster 新增字段 clusterRolejobDataPathzoneId

变更

  • TableBatchReadSession 类变量 predicate 现在设置为 transient。
  • Attribute 增加转义逻辑,并不再会 double quote。
  • SQLTask 恢复了在 0.49.0 版本移除的 SQLTask.run(Odps odps, String project, String sql, String taskName, Map<String, String> hints, Map<String, String> aliases, int priority) 方法,以解决用户的 MR 作业依赖老版本 SDK 时可能发生的接口冲突问题。

修复

  • Table.changeOwner 修复 SQL 拼写错误。
  • Instance.getTaskSummary 移除自 0.50.2 版本开始的不合理打印的 debug 日志。
  • TruncTime 在建表/toString 时,使用反引号对 columnName 进行 quote。

注意: 此版本还包括“0.51.0-public.rc0”和“0.51.0-public.rc1”的所有更改。