Skip to content

Support parquet write from Arrow record batch #735

@luoyuxia

Description

@luoyuxia

Describe the enhancement requested

I'm working on apache/fluss#107 which enable convert Fluss arow structure data to Parquet directly but found the API missing in here.
Althogh ARROW-11776 supports to write from ArrowReader to file, it read from the ArrowReader, write and close the file direclty. But it's in a very coarse-grained , we almost have no control about the writing. Sometime, we want to control when to close the written parquet. Also it requires ArrowReader, but if the arrow RecordBatch is read continuously from remote server . It's not easy to constuct a ArrowReader.
So, I think we may need to support the interface to write Arrow RecordBatch to Parquet via virtual ::arrow::Status WriteRecordBatch(const ::arrow::RecordBatch& batch) = 0
Just to as a show case, the api may look like:

public class ArrowBatchParquetWriter {
   void write(RecordBatch recordbatch);
   void close()
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions