[DNS]: prototype exporting to arrow and re-importing#4947
Draft
LalitMaganti wants to merge 10 commits intomainfrom
Draft
[DNS]: prototype exporting to arrow and re-importing#4947LalitMaganti wants to merge 10 commits intomainfrom
LalitMaganti wants to merge 10 commits intomainfrom
Conversation
Member
LalitMaganti
commented
Feb 27, 2026
- tp: refactor TarWriter to use virtual sink interface
- tp: add Arrow IPC serializer and flatbuf reader/writer
- tp: add Cursor explicit instantiation
- tp: add TarWriter tests for BufferSink and raw bytes API
- tp: add ExportToArrow to TraceProcessor API
- tp: add RPC and HTTP endpoint for ExportToArrow
- tp: add --export-arrow shell flag
- tp: add Python API for ExportToArrow
Extract TarWriterSink interface so TarWriter can write to either a file descriptor or an in-memory buffer. This enables streaming TAR output to a file without buffering the entire archive in memory, which will be used by the upcoming parquet export feature. Add BufferTarWriterSink (public, for in-memory archives) and FdTarWriterSink (internal, for fd-based output). Also add an AddFile(name, uint8_t*, size) overload for raw byte data.
Adds arrow_ipc.cc/h which can serialize and deserialize Dataframes using the Arrow IPC streaming format. Also adds flatbuf_reader and flatbuf_writer as supporting utilities for the flatbuffer-based Arrow IPC wire format. Registers the kArrowIpcTraceType for detection.
Add cursor_impl.cc with an explicit template instantiation of Cursor<ErrorValueFetcher>. This allows export_parquet.cc to link against the cursor without pulling in all cursor template code.
Add tests for the BufferTarWriterSink (in-memory TAR) and the AddFile(name, uint8_t*, size) raw bytes overload that were introduced in the TarWriter refactor.
4dfd4e1 to
73cdb34
Compare
Adds the ExportToArrow method to the TraceProcessor interface, which exports all intrinsic tables as a TAR archive of Arrow IPC files. Uses SerializeToArrowIpc from the dataframe layer for serialization.
Wires up ExportToArrow through the RPC layer (TPM_EXPORT_ARROW) and adds an /export_to_arrow HTTP endpoint for streaming the TAR archive.
Adds the --export-arrow FILE flag to trace_processor_shell which exports all intrinsic tables as a TAR archive of Arrow IPC files.
Adds export_to_arrow() to the Python TraceProcessor API which streams the TAR archive of Arrow IPC files to an output path.
73cdb34 to
39c68e8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.