-
Notifications
You must be signed in to change notification settings - Fork 85
refactor: increased code sharing between CPU and GPU interpretation in RNTuple reading #1470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…tation of RNTuple data. Changed arrays() argument 'use_GDS' to 'interpreter'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fstrug - nice! Please check some minor comments. Thanks.
Co-authored-by: Ianna Osborne <[email protected]>
Co-authored-by: Ianna Osborne <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, Frank! This looks great to me. I like how you factored out some pieces to make the code cleaner.
The only comment I have is that I'm not very sure about the interpreter
keyword argument, since I think it's a bit ambiguous. It might make sense to implement decompression_executor
and interpretation_executor
like how is done for TTrees, but I'm not familiar with those, so I'd have to look into that. So that can just be a follow-up.
At some point we'll have to sit down and fix all the keyword arguments for the common methods (arrays
, iterate
, etc). But I think it's still okay if we change some of the lesser used keyword arguments while we figure things out and people are not regularly working with RNTuples.
Much of the code used to read RNTuple data via cpu and gpu interpretation is common but contained in separate functions. This means any changes made in one workflow will not be automatically reflected in the other. This pr increases function sharing between the two interpretation modes such that any code improvements are automatically shared. A
FieldClusterMetadata
class has been added to contain metadata used in the reading and deserializing of raw page data.I have also changed the
arrays()
argumentuse_GDS
tointerpreter
since it is not strictly necessary to have GDS support for gpu interpretation of RNTuple data and is more reflective of what the argument is controlling.