Add RFC to add thrift support for task status/update/info

shangm2 · shangm2 · commit 724242f0cb6d · 2025-05-20T08:50:20.000-07:00
diff --git a/RFC-0012-thrift-serialization-for-task-updates.md b/RFC-0012-thrift-serialization-for-task-updates.md
@@ -0,0 +1,385 @@
+# **RFC0 for Presto**
+
+See [CONTRIBUTING.md](CONTRIBUTING.md) for instructions on creating your RFC and the process surrounding it.
+
+## [Thrift Serialization for TaskStatus, TaskInfo and TaskUpdateRequest]
+
+Proposers
+
+* Shang Ma
+* Vivian Hsu
+
+## [Related Issues]
+
+Related issues may include Github issues, PRs or other RFCs.
+- prestodb/presto#25020
+- prestodb/presto#25079
+
+## Summary
+
+Support thrift serialization of TaskStatus, TaskInfo, and TaskUpdateRequest classes for getTaskStatus and createOrUpdateTask APIs for both Java and C++ worker types to reduce CPU overhead
+
+## Background
+
+Presto coordinator sends updates to workers and workers respond with taskInfo. Both the taskUpdateRequest and taskInfo are currently serialized using JSON, which can be CPU intensive. And in the case of high task currency, this can become a bottleneck for the coordinator which in turn becomes a bottleneck for the whole cluster.
+
+
+### [Optional] Goals
+1. Support thrift serde for TaskStatus, TaskInfo, and TaskRequestUpdate classes for both Java and C++ workers
+2. Maintain backward compatibility with existing JSON serialization 
+3. Use drift IDL generator to produce the IDL file and use it to generate c++ classes for native workers 
+4. Allow multiple serialization formats to coexist 
+5. Support future serialization formats without SPI changes 
+6. Allow gradual migration from current design to new design
+
+
+## Proposed Implementation
+
+### Disclaimer: Pseudo code and will be different in real implementation.
+### Current Architecture for Json Serde
+```java
+
+//  Use jackson annotation
+public class Split {
+    @JsonProperty
+    private final ConnectorSplit connectorSplit;
+    // ... other fields and methods
+    @JsonCreator
+    public Split(...);
+}
+```
+
+#### For Polymorphic Types e.g. ConnectorSplit
+```java
+
+public class HandleJsonModule
+        implements Module
+{
+    @Override
+    public void configure(Binder binder)
+    {
+        jsonBinder(binder).addModuleBinding().to(TableHandleJacksonModule.class);
+        jsonBinder(binder).addModuleBinding().to(TableLayoutHandleJacksonModule.class);
+        jsonBinder(binder).addModuleBinding().to(ColumnHandleJacksonModule.class);
+        jsonBinder(binder).addModuleBinding().to(SplitJacksonModule.class);
+        jsonBinder(binder).addModuleBinding().to(OutputTableHandleJacksonModule.class);
+        jsonBinder(binder).addModuleBinding().to(InsertTableHandleJacksonModule.class);
+        jsonBinder(binder).addModuleBinding().to(DeleteTableHandleJacksonModule.class);
+        jsonBinder(binder).addModuleBinding().to(IndexHandleJacksonModule.class);
+        jsonBinder(binder).addModuleBinding().to(TransactionHandleJacksonModule.class);
+        jsonBinder(binder).addModuleBinding().to(PartitioningHandleJacksonModule.class);
+        jsonBinder(binder).addModuleBinding().to(FunctionHandleJacksonModule.class);
+        jsonBinder(binder).addModuleBinding().to(MetadataUpdateJacksonModule.class);
+
+        binder.bind(HandleResolver.class).in(Scopes.SINGLETON);
+    }
+}
+
+// A handle resolver to return the correct type info in runtime
+public HandleResolver()
+{
+    handleResolvers.put(REMOTE_CONNECTOR_ID.toString(), new MaterializedHandleResolver(new RemoteHandleResolver()));
+    handleResolvers.put("$system", new MaterializedHandleResolver(new SystemHandleResolver()));
+    handleResolvers.put("$info_schema", new MaterializedHandleResolver(new InformationSchemaHandleResolver()));
+    handleResolvers.put("$empty", new MaterializedHandleResolver(new EmptySplitHandleResolver()));
+
+    functionHandleResolvers.put("$static", new MaterializedFunctionHandleResolver(new BuiltInFunctionNamespaceHandleResolver()));
+    functionHandleResolvers.put("$session", new MaterializedFunctionHandleResolver(new SessionFunctionHandleResolver()));
+}
+
+// Register correct serde methods for different types
+protected AbstractTypedJacksonModule(
+            Class<T> baseClass,
+            Function<T, String> nameResolver,
+            Function<String, Class<? extends T>> classResolver)
+{
+    super(baseClass.getSimpleName() + "Module", Version.unknownVersion());
+
+    TypeIdResolver typeResolver = new InternalTypeResolver<>(nameResolver, classResolver);
+
+    addSerializer(baseClass, new InternalTypeSerializer<>(baseClass, typeResolver));
+    addDeserializer(baseClass, new InternalTypeDeserializer<>(baseClass, typeResolver));
+}
+
+//  A class to bind the two above together
+public class SplitJacksonModule
+        extends AbstractTypedJacksonModule<ConnectorSplit>
+{
+    @Inject
+    public SplitJacksonModule(HandleResolver handleResolver)
+    {
+        super(ConnectorSplit.class,
+                handleResolver::getId,
+                handleResolver::getSplitClass);
+    }
+}
+```
+
+### Option 1:  Extend Current Architecture for Thrift Serde
+```java
+// Use drift annotation
+public class Split {
+    @JsonProperty
+    @ThriftField
+    private final ConnectorSplit connectorSplit;
+    // ... other fields and methods
+    @JsonCreator
+    @ThriftConstructor
+    public Split(...);
+}
+```
+
+#### For Polymorphic Types e.g. ConnectorSplit
+```java
+// Similarly, we register correct method for a give type using existing handle resolver
+protected AbstractTyped**Thrift**Module(
+            Class<T> baseClass,
+            Function<T, String> nameResolver,
+            Function<String, Class<? extends T>> classResolver)
+{
+    TypeIdResolver typeResolver = new InternalTypeResolver<>(nameResolver, classResolver);
+
+    add**Thrift**Serializer(baseClass, new InternalType**Thrift**Serializer<>(baseClass, typeResolver));
+    add**Thrift**Deserializer(baseClass, new InternalType**Thrift**Deserializer<>(baseClass, typeResolver));
+}
+```
+
+#### Pros
+- Follow existing design in the code base
+
+#### Cons
+- If we want to change to a different binary serde, we will have to redo the process.
+
+
+
+### Option 2:  Pluggable Serde for Polymorphic Types
+
+```java
+import java.util.HashMap;
+
+public class Split
+{
+    private final ConnectorId connectorId;
+    private final ConnectorSplit connectorSplit;
+    // ... other fields and methods
+
+    public Split(...);
+}
+
+// In presto-spi, an interface for split serde 
+public interface Serializer<T>
+{
+    String getType();
+
+    byte[] serialize(T object);
+
+    T deserialize(byte[] data);
+}
+
+// A json serializer for hive split 
+public class HiveSplitJsonSerializer
+        implements Serializer<HiveSplit>
+{
+    private final ObjectMapper mapper;
+
+    @Override
+    public String getType()
+    {
+        return "json";
+    }
+
+    @Override
+    public byte[] serialize(HiveSplit split)
+    {
+        return mapper.writeValueAsBytes(new HiveSplitSerializable(split));
+    }
+
+    @Override
+    public HiveSplit deserialize(byte[] data)
+    {
+        HiveSplitSerializable serializable = mapper.readValue(data, HiveSplitSerializable.class);
+        return serializable.toHiveSplit();
+    }
+}
+
+//  A thrift serializer for hive split
+public class HiveSplitThriftSerializer
+        implements Serializer<HiveSplit>
+{
+    private final ThriftCodec<ThriftHiveSplit> codec;
+
+    @Override
+    public String getType()
+    {
+        return "thrift";
+    }
+
+    @Override
+    public byte[] serialize(HiveSplit split)
+    {
+        ThriftHiveSplit thriftSplit = new ThriftHiveSplit();
+        // ... populate fields ...
+        return codec.serialize(thriftSplit);
+    }
+
+    @Override
+    public HiveSplit deserialize(byte[] data)
+    {
+        ThriftHiveSplit thriftSplit = codec.deserialize(data);
+        return new HiveSplit(/* construct from thrift object */);
+    }
+}
+
+public class ConnectorManager
+{
+    private synchronized void addConnectorInternal(MaterializedConnector connector)
+    {
+        // existing code
+        // ...
+        // ...
+        connector.getSplitSeralizerProvider()
+                .ifPresent(
+                        connectorTypeSerdeProvider ->
+                                connectorTypeSerdeManager.registerSerializer(connectorId, splitSeralizerProvider));
+    }
+}
+
+// Act as registry to hold the serde methods based on connector id and serde type
+public class ConnectorTypeSerdeManager
+{
+
+    private final Map<String, Serializer> serializers = new HashMap();
+    
+    // Add custom serializer for a given connector type
+    public void registerSerializer(Stirng connectorId, SerializerProvider serializerProvider) {...}
+    public Serializer getSerializer(String connectorId) {...}
+}
+
+// Register the correct serde method within the corresponding connector factory
+public class HiveMetadata implements TransactionalMetadata {
+
+    private final Serializer splitSerializer;
+    
+    public HiveMetadata(...)
+    {
+        this.splitSerializer = new HiveSplitThriftSerializer();
+    }
+    
+    public Serializer getSplitSerializer()
+    {
+        return this.splitSerializer;
+    }
+}
+
+
+// Use a custom codec for ConnectorSplit
+public class SplitCodec implements ThriftCodec<Split>
+{
+    private final ConnectorTypeSerdeManager serdeManager;
+    
+    @Override
+    public void write(...)
+    {
+        TMemoryBuffer transport = new TMemoryBuffer(1024);
+        TProtocolWriter writer = new TBinaryProtocol(transport);
+
+        // write the connector id/type
+        writer.writeStructBegin(new TStruct("Split"));
+        writer.writeFieldBegin(new TField("connectorId", TType.String, (short) 2));
+        writer.writeString("hive");
+        writer.writeFieldEnd();
+
+        // write the real data with pseudo code
+        writer.writeBinary(ByteBuffer.wrap(serdeManager.getSerializer(HiveTransactionHandle.class).serialize(aHiveSplitObject)));
+        writer.writeBinary(ByteBuffer.wrap(serdeManager.getSerializer(HiveSplit.class).serialize(aHiveSplitObject)));
+        writer.writeBinary(aLifespan);
+        writer.writeBinary(aSplitContext);
+        writer.writeStructEnd();
+    }
+
+    @Override
+    public Split read(...)
+    {
+        // first, we read the connector id to know what type of split we are dealing with
+        reader.read(connectorId);
+        
+        // say, it is a hive split, then
+        reader.read(aHiveTransactionHandle);
+        reader.read(aHiveSplit);
+        
+        // lastly, we read the rest
+        reader.read(aLifespan);
+        reader.read(aSplitContext);
+    }
+        
+}
+
+// Conceptually, within the byte array from serialization, we will find
+* String connectId; // hive or other connector
+* private final byte[] data; // the real data of a serialized split
+
+```
+
+#### Pros
+- Each connector can choose its own serialization format
+- The internal details of how serialization is handled are hidden
+- Connectors can evolve their serialization format independently
+  - New connectors can adopt newer, more efficient serialization formats without waiting for the entire system to upgrade
+  - Existing connectors can migrate to better formats without forcing other connectors to change
+  - Performance optimizations can be made on a per-connector basis
+
+#### Cons
+- This design is different from the existing Jackson serde flow in the code base.
+
+
+### Q & A
+1. What modules are involved
+   * presto-spi
+   * presto-main-base
+   * presto-main
+   * presto-hive
+2. Any new terminologies/concepts/SQL language additions
+   * N/A 
+3. Method/class/interface contracts which you deem fit for implementation.
+   * See above code example
+4. Code flow using bullet points or pseudo code as applicable
+   * See above code example
+5. Any new user facing metrics that can be shown on CLI or UI.
+   * N/A
+
+## [Optional] Metrics
+
+How can we measure the impact of this feature?
+1. taskUpdateSerializedCpuNanos
+2. taskUpdateDeliveredWallTimeNanos
+3. CPU usage for task update serde
+
+## [Optional] Other Approaches Considered
+1. See Option 1
+
+## Adoption Plan
+
+### Rollout 
+* As the first step, we will use drift to annotate all primitive types within those 3 classes mentioned before while keep complicated data types, e.g. Split, MetadataUpdate, TableWriteInfo as json
+* During the second step, we will add thrift support for those complicated data classes using one of the two options proposed above.
+
+- What impact (if any) will there be on existing users? Are there any new session parameters, configurations, SPI updates, client API updates, or SQL grammar?
+    * the thrift serde will be disabled by default and can be enabled by a config 
+- If we are changing behaviour how will we phase out the older behaviour?
+    * N/A
+- If we need special migration tools, describe them here.
+    * N/A
+- When will we remove the existing behaviour, if applicable.
+    * N/A
+- How should this feature be taught to new and existing users? Basically mention if documentation changes/new blog are needed?
+    * This feature will be documented in the Presto documentation.
+- What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC?
+    * N/A
+
+## Test Plan
+
+How do we ensure the feature works as expected? Mention if any functional tests/integration tests are needed. Special mention for product-test changes. If any PoC has been done already, please mention the relevant test results here that you think will bolster your case of getting this RFC approved.
+
+- A PoC for step 1 about primitive type can be found from the following 2 PRs:
+    * prestodb/presto#25020 
+    * prestodb/presto#25079