Skip to content

[common] Handle single-element union in superset schema generation#2734

Open
kvargha wants to merge 3 commits intolinkedin:mainfrom
kvargha:kvargha/fix-superset-schema-single-element-union
Open

[common] Handle single-element union in superset schema generation#2734
kvargha wants to merge 3 commits intolinkedin:mainfrom
kvargha:kvargha/fix-superset-schema-single-element-union

Conversation

@kvargha
Copy link
Copy Markdown
Contributor

@kvargha kvargha commented Apr 14, 2026

Problem Statement

generateSupersetSchema() throws VeniceException("Incompatible schema") when two record schemas have the same field where one version uses a single-element union (e.g. [{"type":"array","items":"long"}]) and the other uses the bare type ({"type":"array","items":"long"}). These are semantically equivalent in Avro, but Schema.getType() returns UNION for the wrapper form and ARRAY for the bare form, so the type check at the top of generateSupersetSchema() rejects them as incompatible.

This blocks superset schema generation for any store whose value schema evolved between these two representations.

Solution

Add unwrapSingleElementUnion() which normalizes a single-element union [T] to its inner type T. This runs at the start of generateSupersetSchema() before the type comparison, so all downstream logic (type switch, recursive merge) operates on the unwrapped types.

The unwrapping is a no-op for schemas that are not single-element unions, so existing behavior is unchanged for all other schema types.

Code changes

  • Added new code behind a config. If so list the config names and their default values in the PR description.
  • Introduced new log lines.
    • Confirmed if logs need to be rate limited to avoid excessive logging.

Concurrency-Specific Checks

Both reviewer and PR author to verify

  • Code has no race conditions or thread safety issues.
  • Proper synchronization mechanisms (e.g., synchronized, RWLock) are used where needed.
  • No blocking calls inside critical sections that could lead to deadlocks or performance degradation.
  • Verified thread-safe collections are used (e.g., ConcurrentHashMap, CopyOnWriteArrayList).
  • Validated proper exception handling in multi-threaded code to avoid silent thread termination.

How was this PR tested?

  • New unit tests added.
  • New integration tests added.
  • Modified or extended existing tests.
  • Verified backward compatibility (if applicable).

Three new unit tests:

  • testSchemaMergeSingleElementUnionVsPlainType — single-element union array vs plain array, both directions
  • testSchemaMergeSingleElementUnionVsPlainTypeWithAdditionalFields — same with additional fields to verify superset field merging still works
  • testSchemaMergeSingleElementUnionMap — single-element union map vs plain map

All 31 existing TestAvroSupersetSchemaUtils tests pass (no regressions).

Does this PR introduce any user-facing or breaking changes?

  • No. You can skip the rest of this section.
  • Yes. Clearly explain the behavior change and its impact.

`generateSupersetSchema()` now unwraps single-element unions before
comparing schema types. A union of one type `[T]` is semantically
equivalent to `T` in Avro, but `Schema.getType()` returns `UNION`
for the wrapper form while the bare form returns the inner type.
Without unwrapping, the type check throws `VeniceException` for
schemas that only differ in whether a field uses `[array]` vs `array`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 14, 2026 18:36
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Avro superset schema generation to treat single-element unions ([T]) as semantically equivalent to the bare type (T), preventing false “Incompatible schema” failures during schema merges when one version wraps a field type in a one-branch union and another does not.

Changes:

  • Normalize schemas via unwrapSingleElementUnion() at the start of generateSupersetSchema() before type comparison and downstream merging.
  • Add unit tests covering [array] vs array, [map] vs map, and a record merge scenario with additional fields.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
internal/venice-client-common/src/main/java/com/linkedin/venice/utils/AvroSupersetSchemaUtils.java Unwraps single-element unions prior to type checks and merge logic to avoid UNION-vs-inner-type incompatibility errors.
internal/venice-client-common/src/test/java/com/linkedin/venice/schema/TestAvroSupersetSchemaUtils.java Adds regression tests ensuring superset generation succeeds for single-element union wrappers vs plain complex types.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +45 to 55
// Unwrap single-element unions to their inner type so that [T] is treated
// equivalently to T during type comparison and superset generation.
final Schema existing = unwrapSingleElementUnion(existingSchema);
final Schema newer = unwrapSingleElementUnion(newSchema);

if (existing.getType() != newer.getType()) {
throw new VeniceException("Incompatible schema");
}
if (Objects.equals(existingSchema, newSchema)) {
return existingSchema;
if (Objects.equals(existing, newer)) {
return existing;
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generateSupersetSchema() now normalizes [T] to T and can return an unwrapped schema even when one input was a single-element UNION. Downstream callers that use AvroSchemaUtils.compareSchemaIgnoreFieldOrder() (e.g., isSupersetSchema() and generateSupersetSchemaFromAllValueSchemas() when checking whether the generated superset matches an existing schema) will still treat UNION vs the inner type as different and may incorrectly return false / fail to match an existing schema entry, potentially causing unnecessary new superset schema IDs. Consider normalizing single-element unions consistently in the schema comparison path as well (or in isSupersetSchema() before comparing), and add a regression test for that behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +783 to +813
@Test
public void testSchemaMergeSingleElementUnionVsPlainType() {
// v1: opportunityIds is a single-element union wrapping an array
String schemaStr1 = "{\"type\":\"record\",\"name\":\"TestRecord\"," + "\"namespace\":\"com.example\","
+ "\"fields\":[{\"name\":\"opportunityIds\"," + "\"type\":[{\"type\":\"array\",\"items\":\"long\"}],"
+ "\"doc\":\"List of opportunityIds.\"}]}";

// v2: opportunityIds is a plain array (no union wrapper)
String schemaStr2 = "{\"type\":\"record\",\"name\":\"TestRecord\"," + "\"namespace\":\"com.example\","
+ "\"fields\":[{\"name\":\"opportunityIds\"," + "\"type\":{\"type\":\"array\",\"items\":\"long\"},"
+ "\"doc\":\"List of opportunityIds.\"}]}";

Schema s1 = AvroSchemaParseUtils.parseSchemaFromJSONStrictValidation(schemaStr1);
Schema s2 = AvroSchemaParseUtils.parseSchemaFromJSONStrictValidation(schemaStr2);

// Should not throw — these are semantically equivalent
Schema superset12 = AvroSupersetSchemaUtils.generateSupersetSchema(s1, s2);
Assert.assertNotNull(superset12);
Assert.assertNotNull(superset12.getField("opportunityIds"));

// The result should be an array type (the unwrapped form)
Assert.assertEquals(superset12.getField("opportunityIds").schema().getType(), Schema.Type.ARRAY);

// Should also work in the reverse direction
Schema superset21 = AvroSupersetSchemaUtils.generateSupersetSchema(s2, s1);
Assert.assertNotNull(superset21);
Assert.assertEquals(superset21.getField("opportunityIds").schema().getType(), Schema.Type.ARRAY);

// Both orderings should produce equivalent superset schemas
Assert.assertTrue(AvroSchemaUtils.compareSchemaIgnoreFieldOrder(superset12, superset21));
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new tests validate that generateSupersetSchema() no longer throws for [T] vs T, but they don’t cover the boolean helper isSupersetSchema() (which currently relies on compareSchemaIgnoreFieldOrder, and may still treat single-element UNION vs bare types as different). Adding assertions for isSupersetSchema() in these scenarios would prevent regressions where superset generation succeeds but superset detection/equality checks still fail.

Copilot uses AI. Check for mistakes.
kvargha and others added 2 commits April 14, 2026 13:59
unwrappedExistingSchema / unwrappedNewSchema make it explicit that
these are the post-unwrap versions of the method parameters.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
rawExistingSchema/rawNewSchema parameters let existingSchema/newSchema
locals take the unwrapped values, so the rest of generateSupersetSchema
is identical to main.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 14, 2026 21:06
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +44 to 52
public static Schema generateSupersetSchema(Schema rawExistingSchema, Schema rawNewSchema) {
// Unwrap single-element unions to their inner type so that [T] is treated
// equivalently to T during type comparison and superset generation.
final Schema existingSchema = unwrapSingleElementUnion(rawExistingSchema);
final Schema newSchema = unwrapSingleElementUnion(rawNewSchema);

if (existingSchema.getType() != newSchema.getType()) {
throw new VeniceException("Incompatible schema");
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unwrapSingleElementUnion() runs before the type check, so a singleton-union like [T] will be converted to T even when the other side is a real UNION (e.g. [T] vs ["null", T]). That turns a previously-mergeable UNION-vs-UNION case into a hard type mismatch (T vs UNION) and throws Incompatible schema. Consider only unwrapping singleton unions when the other schema is not a UNION, or adding a fallback path that retries the merge using the raw UNION schemas when either input was a singleton-union and the other side is UNION.

Copilot uses AI. Check for mistakes.
Comment on lines +44 to +49
public static Schema generateSupersetSchema(Schema rawExistingSchema, Schema rawNewSchema) {
// Unwrap single-element unions to their inner type so that [T] is treated
// equivalently to T during type comparison and superset generation.
final Schema existingSchema = unwrapSingleElementUnion(rawExistingSchema);
final Schema newSchema = unwrapSingleElementUnion(rawNewSchema);

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this change, generateSupersetSchema() will normalize [T] and T to the same merged output schema, but callers that rely on strict structural equality (e.g. AvroSupersetSchemaUtils.isSupersetSchema() via AvroSchemaUtils.compareSchemaIgnoreFieldOrder) will still treat [T] and T as different and may return false even though the schemas are now merge-compatible. If the intent is to treat singleton-union wrappers as equivalent throughout, isSupersetSchema() (and/or the comparison utility it uses) likely needs the same normalization so behavior is consistent for downstream users.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants