-
Notifications
You must be signed in to change notification settings - Fork 94
Description
Describe the enhancement requested
I'm working on apache/fluss#107 which enable convert Fluss arow structure data to Parquet directly but found the API missing in here.
Althogh ARROW-11776 supports to write from ArrowReader to file, it read from the ArrowReader, write and close the file direclty. But it's in a very coarse-grained , we almost have no control about the writing. Sometime, we want to control when to close the written parquet. Also it requires ArrowReader, but if the arrow RecordBatch is read continuously from remote server . It's not easy to constuct a ArrowReader.
So, I think we may need to support the interface to write Arrow RecordBatch to Parquet via virtual ::arrow::Status WriteRecordBatch(const ::arrow::RecordBatch& batch) = 0
Just to as a show case, the api may look like:
public class ArrowBatchParquetWriter {
void write(RecordBatch recordbatch);
void close()
}