[common] Handle single-element union in superset schema generation#2734
[common] Handle single-element union in superset schema generation#2734kvargha wants to merge 3 commits intolinkedin:mainfrom
Conversation
`generateSupersetSchema()` now unwraps single-element unions before comparing schema types. A union of one type `[T]` is semantically equivalent to `T` in Avro, but `Schema.getType()` returns `UNION` for the wrapper form while the bare form returns the inner type. Without unwrapping, the type check throws `VeniceException` for schemas that only differ in whether a field uses `[array]` vs `array`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes Avro superset schema generation to treat single-element unions ([T]) as semantically equivalent to the bare type (T), preventing false “Incompatible schema” failures during schema merges when one version wraps a field type in a one-branch union and another does not.
Changes:
- Normalize schemas via
unwrapSingleElementUnion()at the start ofgenerateSupersetSchema()before type comparison and downstream merging. - Add unit tests covering
[array]vsarray,[map]vsmap, and a record merge scenario with additional fields.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| internal/venice-client-common/src/main/java/com/linkedin/venice/utils/AvroSupersetSchemaUtils.java | Unwraps single-element unions prior to type checks and merge logic to avoid UNION-vs-inner-type incompatibility errors. |
| internal/venice-client-common/src/test/java/com/linkedin/venice/schema/TestAvroSupersetSchemaUtils.java | Adds regression tests ensuring superset generation succeeds for single-element union wrappers vs plain complex types. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Unwrap single-element unions to their inner type so that [T] is treated | ||
| // equivalently to T during type comparison and superset generation. | ||
| final Schema existing = unwrapSingleElementUnion(existingSchema); | ||
| final Schema newer = unwrapSingleElementUnion(newSchema); | ||
|
|
||
| if (existing.getType() != newer.getType()) { | ||
| throw new VeniceException("Incompatible schema"); | ||
| } | ||
| if (Objects.equals(existingSchema, newSchema)) { | ||
| return existingSchema; | ||
| if (Objects.equals(existing, newer)) { | ||
| return existing; | ||
| } |
There was a problem hiding this comment.
generateSupersetSchema() now normalizes [T] to T and can return an unwrapped schema even when one input was a single-element UNION. Downstream callers that use AvroSchemaUtils.compareSchemaIgnoreFieldOrder() (e.g., isSupersetSchema() and generateSupersetSchemaFromAllValueSchemas() when checking whether the generated superset matches an existing schema) will still treat UNION vs the inner type as different and may incorrectly return false / fail to match an existing schema entry, potentially causing unnecessary new superset schema IDs. Consider normalizing single-element unions consistently in the schema comparison path as well (or in isSupersetSchema() before comparing), and add a regression test for that behavior.
| @Test | ||
| public void testSchemaMergeSingleElementUnionVsPlainType() { | ||
| // v1: opportunityIds is a single-element union wrapping an array | ||
| String schemaStr1 = "{\"type\":\"record\",\"name\":\"TestRecord\"," + "\"namespace\":\"com.example\"," | ||
| + "\"fields\":[{\"name\":\"opportunityIds\"," + "\"type\":[{\"type\":\"array\",\"items\":\"long\"}]," | ||
| + "\"doc\":\"List of opportunityIds.\"}]}"; | ||
|
|
||
| // v2: opportunityIds is a plain array (no union wrapper) | ||
| String schemaStr2 = "{\"type\":\"record\",\"name\":\"TestRecord\"," + "\"namespace\":\"com.example\"," | ||
| + "\"fields\":[{\"name\":\"opportunityIds\"," + "\"type\":{\"type\":\"array\",\"items\":\"long\"}," | ||
| + "\"doc\":\"List of opportunityIds.\"}]}"; | ||
|
|
||
| Schema s1 = AvroSchemaParseUtils.parseSchemaFromJSONStrictValidation(schemaStr1); | ||
| Schema s2 = AvroSchemaParseUtils.parseSchemaFromJSONStrictValidation(schemaStr2); | ||
|
|
||
| // Should not throw — these are semantically equivalent | ||
| Schema superset12 = AvroSupersetSchemaUtils.generateSupersetSchema(s1, s2); | ||
| Assert.assertNotNull(superset12); | ||
| Assert.assertNotNull(superset12.getField("opportunityIds")); | ||
|
|
||
| // The result should be an array type (the unwrapped form) | ||
| Assert.assertEquals(superset12.getField("opportunityIds").schema().getType(), Schema.Type.ARRAY); | ||
|
|
||
| // Should also work in the reverse direction | ||
| Schema superset21 = AvroSupersetSchemaUtils.generateSupersetSchema(s2, s1); | ||
| Assert.assertNotNull(superset21); | ||
| Assert.assertEquals(superset21.getField("opportunityIds").schema().getType(), Schema.Type.ARRAY); | ||
|
|
||
| // Both orderings should produce equivalent superset schemas | ||
| Assert.assertTrue(AvroSchemaUtils.compareSchemaIgnoreFieldOrder(superset12, superset21)); | ||
| } |
There was a problem hiding this comment.
The new tests validate that generateSupersetSchema() no longer throws for [T] vs T, but they don’t cover the boolean helper isSupersetSchema() (which currently relies on compareSchemaIgnoreFieldOrder, and may still treat single-element UNION vs bare types as different). Adding assertions for isSupersetSchema() in these scenarios would prevent regressions where superset generation succeeds but superset detection/equality checks still fail.
unwrappedExistingSchema / unwrappedNewSchema make it explicit that these are the post-unwrap versions of the method parameters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
rawExistingSchema/rawNewSchema parameters let existingSchema/newSchema locals take the unwrapped values, so the rest of generateSupersetSchema is identical to main. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public static Schema generateSupersetSchema(Schema rawExistingSchema, Schema rawNewSchema) { | ||
| // Unwrap single-element unions to their inner type so that [T] is treated | ||
| // equivalently to T during type comparison and superset generation. | ||
| final Schema existingSchema = unwrapSingleElementUnion(rawExistingSchema); | ||
| final Schema newSchema = unwrapSingleElementUnion(rawNewSchema); | ||
|
|
||
| if (existingSchema.getType() != newSchema.getType()) { | ||
| throw new VeniceException("Incompatible schema"); | ||
| } |
There was a problem hiding this comment.
unwrapSingleElementUnion() runs before the type check, so a singleton-union like [T] will be converted to T even when the other side is a real UNION (e.g. [T] vs ["null", T]). That turns a previously-mergeable UNION-vs-UNION case into a hard type mismatch (T vs UNION) and throws Incompatible schema. Consider only unwrapping singleton unions when the other schema is not a UNION, or adding a fallback path that retries the merge using the raw UNION schemas when either input was a singleton-union and the other side is UNION.
| public static Schema generateSupersetSchema(Schema rawExistingSchema, Schema rawNewSchema) { | ||
| // Unwrap single-element unions to their inner type so that [T] is treated | ||
| // equivalently to T during type comparison and superset generation. | ||
| final Schema existingSchema = unwrapSingleElementUnion(rawExistingSchema); | ||
| final Schema newSchema = unwrapSingleElementUnion(rawNewSchema); | ||
|
|
There was a problem hiding this comment.
After this change, generateSupersetSchema() will normalize [T] and T to the same merged output schema, but callers that rely on strict structural equality (e.g. AvroSupersetSchemaUtils.isSupersetSchema() via AvroSchemaUtils.compareSchemaIgnoreFieldOrder) will still treat [T] and T as different and may return false even though the schemas are now merge-compatible. If the intent is to treat singleton-union wrappers as equivalent throughout, isSupersetSchema() (and/or the comparison utility it uses) likely needs the same normalization so behavior is consistent for downstream users.
Problem Statement
generateSupersetSchema()throwsVeniceException("Incompatible schema")when two record schemas have the same field where one version uses a single-element union (e.g.[{"type":"array","items":"long"}]) and the other uses the bare type ({"type":"array","items":"long"}). These are semantically equivalent in Avro, butSchema.getType()returnsUNIONfor the wrapper form andARRAYfor the bare form, so the type check at the top ofgenerateSupersetSchema()rejects them as incompatible.This blocks superset schema generation for any store whose value schema evolved between these two representations.
Solution
Add
unwrapSingleElementUnion()which normalizes a single-element union[T]to its inner typeT. This runs at the start ofgenerateSupersetSchema()before the type comparison, so all downstream logic (type switch, recursive merge) operates on the unwrapped types.The unwrapping is a no-op for schemas that are not single-element unions, so existing behavior is unchanged for all other schema types.
Code changes
Concurrency-Specific Checks
Both reviewer and PR author to verify
synchronized,RWLock) are used where needed.ConcurrentHashMap,CopyOnWriteArrayList).How was this PR tested?
Three new unit tests:
testSchemaMergeSingleElementUnionVsPlainType— single-element union array vs plain array, both directionstestSchemaMergeSingleElementUnionVsPlainTypeWithAdditionalFields— same with additional fields to verify superset field merging still workstestSchemaMergeSingleElementUnionMap— single-element union map vs plain mapAll 31 existing
TestAvroSupersetSchemaUtilstests pass (no regressions).Does this PR introduce any user-facing or breaking changes?