Skip to content

[DNS]: prototype exporting to arrow and re-importing#4947

Draft
LalitMaganti wants to merge 10 commits intomainfrom
dev/lalitm/parquet-export
Draft

[DNS]: prototype exporting to arrow and re-importing#4947
LalitMaganti wants to merge 10 commits intomainfrom
dev/lalitm/parquet-export

Conversation

@LalitMaganti
Copy link
Member

  • tp: refactor TarWriter to use virtual sink interface
  • tp: add Arrow IPC serializer and flatbuf reader/writer
  • tp: add Cursor explicit instantiation
  • tp: add TarWriter tests for BufferSink and raw bytes API
  • tp: add ExportToArrow to TraceProcessor API
  • tp: add RPC and HTTP endpoint for ExportToArrow
  • tp: add --export-arrow shell flag
  • tp: add Python API for ExportToArrow

Extract TarWriterSink interface so TarWriter can write to either a file
descriptor or an in-memory buffer. This enables streaming TAR output to
a file without buffering the entire archive in memory, which will be
used by the upcoming parquet export feature.

Add BufferTarWriterSink (public, for in-memory archives) and
FdTarWriterSink (internal, for fd-based output). Also add an
AddFile(name, uint8_t*, size) overload for raw byte data.
@LalitMaganti LalitMaganti changed the title dev/lalitm/parquet export [DNS]: prototype exporting to arrow and re-importing Feb 27, 2026
@github-actions
Copy link

github-actions bot commented Feb 27, 2026

Adds arrow_ipc.cc/h which can serialize and deserialize Dataframes
using the Arrow IPC streaming format. Also adds flatbuf_reader and
flatbuf_writer as supporting utilities for the flatbuffer-based Arrow
IPC wire format. Registers the kArrowIpcTraceType for detection.
Add cursor_impl.cc with an explicit template instantiation of
Cursor<ErrorValueFetcher>. This allows export_parquet.cc to link
against the cursor without pulling in all cursor template code.
Add tests for the BufferTarWriterSink (in-memory TAR) and the
AddFile(name, uint8_t*, size) raw bytes overload that were introduced
in the TarWriter refactor.
@LalitMaganti LalitMaganti force-pushed the dev/lalitm/parquet-export branch from 4dfd4e1 to 73cdb34 Compare February 27, 2026 02:40
Adds the ExportToArrow method to the TraceProcessor interface, which
exports all intrinsic tables as a TAR archive of Arrow IPC files.
Uses SerializeToArrowIpc from the dataframe layer for serialization.
Wires up ExportToArrow through the RPC layer (TPM_EXPORT_ARROW) and
adds an /export_to_arrow HTTP endpoint for streaming the TAR archive.
Adds the --export-arrow FILE flag to trace_processor_shell which
exports all intrinsic tables as a TAR archive of Arrow IPC files.
Adds export_to_arrow() to the Python TraceProcessor API which streams
the TAR archive of Arrow IPC files to an output path.
@LalitMaganti LalitMaganti force-pushed the dev/lalitm/parquet-export branch from 73cdb34 to 39c68e8 Compare February 27, 2026 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant