-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-52448][CONNECT] Add simplified Struct Expression.Literal #51561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5d8e764
to
9d68335
Compare
repeated StructField fields = 3; | ||
|
||
message StructField { | ||
string name = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this improvement. So we will contain the field name in every element, the space complexity would be O(n) compared to O(1) in the deprecated way? So what's the perf gain from this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that the space complexity in the deprecated approach is also O(n), since every element includes the field name in the struct_type field, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, it is not necessary to include the field name/data type in every element, since not setting the data_type field should be sufficient for saving space.
1db4711
to
f38223d
Compare
Merged to master. |
### What changes were proposed in this pull request? This PR adds a new `data_type_struct` field to the protobuf definition for struct literals in Spark Connect, addressing the ambiguity issues with the existing `struct_type` field. The changes include: 1. **Protobuf Schema Update**: Added a new `data_type_struct` field of type `DataType.Struct` to the `Literal.Struct` message in `expressions.proto`, while marking the existing `struct_type` field as deprecated. 2. **Enhanced Struct Conversion Logic**: Updated `LiteralValueProtoConverter.scala` to: - Use the new `data_type_struct` field when available for more precise struct type definition - Maintain backward compatibility by still supporting the deprecated `struct_type` field - Add proper field metadata handling in struct conversions - Improve type inference for struct fields when data types can be inferred from literal values ### Why are the changes needed? The current Expression.Struct literal is somewhat overcomplicated since it duplicates most of the information its fields already have. This is bulky to send over the wire, and it can be ambiguous. ### Does this PR introduce _any_ user-facing change? No. This PR maintains backward compatibility with existing struct literal implementations. Existing code using the deprecated `struct_type` field will continue to work without modification. ### How was this patch tested? `build/sbt "connect/testOnly *LiteralExpressionProtoConverterSuite"` ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor 1.2.4 Closes apache#51561 from heyihong/SPARK-52448. Authored-by: Yihong He <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
What changes were proposed in this pull request?
This PR adds a new
data_type_struct
field to the protobuf definition for struct literals in Spark Connect, addressing the ambiguity issues with the existingstruct_type
field. The changes include:Protobuf Schema Update: Added a new
data_type_struct
field of typeDataType.Struct
to theLiteral.Struct
message inexpressions.proto
, while marking the existingstruct_type
field as deprecated.Enhanced Struct Conversion Logic: Updated
LiteralValueProtoConverter.scala
to:data_type_struct
field when available for more precise struct type definitionstruct_type
fieldWhy are the changes needed?
The current Expression.Struct literal is somewhat overcomplicated since it duplicates most of the information its fields already have. This is bulky to send over the wire, and it can be ambiguous.
Does this PR introduce any user-facing change?
No. This PR maintains backward compatibility with existing struct literal implementations. Existing code using the deprecated
struct_type
field will continue to work without modification.How was this patch tested?
build/sbt "connect/testOnly *LiteralExpressionProtoConverterSuite"
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor 1.2.4