-
Notifications
You must be signed in to change notification settings - Fork 93
GH-891: Add ExtensionTypeWriterFactory to TransferPair #892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
67334a6 to
7eba2c1
Compare
|
Hello, @lidavidm! Could you take a look at this PR? Also, I don't have permissions to change the label |
|
@jhrotko I will take a look on this one as soon as the CI is green (it should be good very soon). |
vector/src/main/java/org/apache/arrow/vector/util/TransferPairWithExtendedType.java
Outdated
Show resolved
Hide resolved
vector/src/main/java/org/apache/arrow/vector/util/TransferPairWithExtendedType.java
Outdated
Show resolved
Hide resolved
vector/src/main/java/org/apache/arrow/vector/complex/LargeListVector.java
Outdated
Show resolved
Hide resolved
laurentgo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really familiar with arrow vectors to be honest, but I wonder why writers aren't discovered at the same time the extension is being registered as a type? wouldn't that make things simpler from an API/usability perspective?
|
|
||
| </#list></#list> | ||
|
|
||
| public void copyAsValue(StructWriter writer, ExtensionTypeWriterFactory writerFactory) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is okay to remove a public method because there has been no release yet?
Maybe we should discuss it on the mailing list as it seems we haven't found the right pattern yet
|
This PR changes how we handle extension type writers in Arrow Java. Instead of using factories that get passed around everywhere, we now let the ProblemIn Arrow's type system, each The previous implementation (commits // Usage in ComplexCopier
writer.addExtensionTypeWriterFactory(extensionTypeWriterFactory);
writer.writeExtension(value);In this pattern, each extension type had a separate factory class (like Why the factory pattern wasn't working wellFor developers implementing extension types outside of arrow-java, the situation was even more painful. You had to create and manage two separate classes: one for the type itself ( The factory pattern had several issues that made it difficult to scale at this point. Specially if you wanted to use Extension Arrow-java types mixed with out of arrow-java extension types which is something that might happen more often in the future. The API also got cluttered with factory parameters. Methods like Finally, the factory pattern created tight coupling between the type definition, the writer implementation, the factory that connects them, and all the code that needs to pass factories around. This made it harder to change any one piece without affecting the others. The new approach: Let types provide their own writersI added one abstract method to public abstract class ExtensionType extends ArrowType {
// NEW METHOD
public abstract FieldWriter getNewFieldWriter(ValueVector vector);
// Other methods...
}public class UuidType extends ExtensionType {
@Override
public FieldWriter getNewFieldWriter(ValueVector vector) {
return new UuidWriterImpl((UuidVector) vector);
}
// Other methods...
}The new approach is simpler because you only need one class per extension type now, not two. The type knows how to create its own writer. This also means the API is cleaner since there are no more factory parameters cluttering everything. For example, This approach is also consistent with how // MinorType enum (existing pattern)
public enum MinorType {
INT(new Int(...)) {
@Override
public FieldWriter getNewFieldWriter(ValueVector vector) {
return new IntWriterImpl((IntVector) vector);
}
},
// ...
}
// ExtensionType (new pattern - same idea)
public class UuidType extends ExtensionType {
@Override
public FieldWriter getNewFieldWriter(ValueVector vector) {
return new UuidWriterImpl((UuidVector) vector);
}
}Finally, there's less coupling overall. Writers don't need to store or manage factories anymore, TransferPair implementations are simpler, and the type information just flows naturally through the ComplexCopier got simpler// OLD: Required factory parameter
case EXTENSIONTYPE:
if (extensionTypeWriterFactory == null) {
throw new IllegalArgumentException("Must provide ExtensionTypeWriterFactory");
}
writer.addExtensionTypeWriterFactory(extensionTypeWriterFactory);
writer.writeExtension(value);
break;
// NEW: Type provides the writer
case EXTENSIONTYPE:
if (reader.isSet()) {
Object value = reader.readObject();
if (value != null) {
writer.writeExtension(value, reader.getField().getType());
}
}
break; |
What's Changed
This PR simplifies extension type writer creation by moving from a factory-based pattern to a type-based pattern. Instead of passing
ExtensionTypeWriterFactoryinstances through multiple API layers, extension types now provide their own writers via a newgetNewFieldWriter()method onArrowType.ExtensionType.getNewFieldWriter(ValueVector)abstract method toArrowType.ExtensionTypeExtensionTypeWriterFactoryinterface and all implementationsComplexCopier,PromotableWriter, andTransferPairAPIsUnionWriterto support extension types (previously threwUnsupportedOperationException)UuidType,OpaqueType)The factory pattern didn't scale well. Each new extension type required creating a separate factory class and passing it through multiple API layers. This was especially painful for external developers who had to maintain two classes per extension type and manage factory parameters everywhere.
The new approach follows the same pattern as
MinorType, where each type knows how to create its own writer. This reduces boilerplate, simplifies the API, and makes it easier to implement custom extension types outside arrow-java.Closes #891 .